date:20151021

Re: Fw: [Bug 106241] New: shutdown(3)/close(3) behaviour is incorrect for sockets in accept(3)

2015-10-21 Thread Eric Dumazet

On Wed, 2015-10-21 at 14:03 +0100, Alan Burlison wrote:
> On 21/10/2015 12:28, Eric Dumazet wrote:
> 
> > This works for me. Please double check your programs
> 
> I have just done so, it works as you say for AF_INET sockets but if you 
> switch to AF_UNIX sockets it does the wrong thing in the way I described.
> 

Oh well. Please try the following :


diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 94f658235fb4..24dec8bb571d 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -328,7 +328,8 @@ found:
 
 static inline int unix_writable(struct sock *sk)
 {
-   return (atomic_read(>sk_wmem_alloc) << 2) <= sk->sk_sndbuf;
+   return sk->sk_state != TCP_LISTEN &&
+  (atomic_read(>sk_wmem_alloc) << 2) <= sk->sk_sndbuf;
 }
 
 static void unix_write_space(struct sock *sk)


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH V5 1/1] bpf: control events stored in PERF_EVENT_ARRAY maps trace data output when perf sampling

2015-10-21 Thread Peter Zijlstra

On Wed, Oct 21, 2015 at 10:01:46PM +0800, pi3orama wrote:
> > 在 2015年10月21日，下午9:49，Peter Zijlstra  写道：
> > 
> >> On Wed, Oct 21, 2015 at 09:42:12PM +0800, Wangnan (F) wrote:
> >> How can an eBPF program access a !local event:
> >> 
> >> when creating perf event array we don't care which perf event
> >> is for which CPU, so perf program can access any perf event in
> >> that array.
> > 
> > So what is stopping the eBPF thing from calling perf_event_read_local()
> > on a !local event and triggering a kernel splat?
> 
> I can understand the perf_event_read_local() case, but I really can't 
> understand
> what is stopping us to write to an atomic field belong to a !local perf event.
> Could you please give a further explanation?

I simply do not get how this eBPF stuff works.

Either I have access to !local events and I can hand one to
perf_event_read_local() and cause badness, or I do not have access to
!local events and the whole 'soft enable/disable' thing is simple.

They cannot be both true.

So explain; how does this eBPF stuff work.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH 5/5] openvswitch: Interface with NAT.

2015-10-21 Thread Florian Westphal

Thomas Graf  wrote:
> On 10/21/15 at 11:34am, Florian Westphal wrote:
> > Jarno Rajahalme  wrote:
> > >  #define OVS_CS_F_REPLY_DIR 0x08 /* Flow is in the reply 
> > > direction. */
> > >  #define OVS_CS_F_INVALID   0x10 /* Could not track connection. */
> > >  #define OVS_CS_F_TRACKED   0x20 /* Conntrack has occurred. */
> > > +#define OVS_CS_F_SRC_NAT   0x40 /* Packet's source address/port 
> > > was
> > > +mangled by NAT. */
> > > +#define OVS_CS_F_DST_NAT   0x80 /* Packet's destination 
> > > address/port
> > > +was mangled by NAT. */
> > 
> > I'm blind -- how does ovs deal with change of output device and the
> > ether dst mac as result of a l3 dst translation?
> 
> I assume you are referring to rewriting of L2 and the forwarding decision
> after NAT. As NAT is performed in combination with conntrack, the packet
> is recirculated and hits the flow table again after NAT. That 2nd
> stage flow must take are of performing L3 by rewriting L2, decrementing
> TTL, etc.

> Is this what you are referring to?

Yes, exactly, thanks for answering my question.

[ in classic bridge netfilter this requires route lookup & neigh stunts
  to deal with the consequences of dnat, i.e.

- route says dst is reachable via some other interface not part of
bridge
- route says that dst is localhost
- route says its on same bridge, but neigh has no idea what the new
dst mac address is,etc.

I was kinda disappointed to not see similar tur^W hacks ;)
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net] net: try harder to not reuse ifindex when moving interfaces

2015-10-21 Thread David Miller

From: Jiri Benc 
Date: Wed, 21 Oct 2015 16:46:13 +0200

> For example, we could always alloc a new ifindex when moving interfaces
> between name spaces. That would be probably the tiniest race window we
> could get to (still not zero!) but I guess it would break apps that
> assume that ifindex doesn't change when moving interfaces between name
> spaces (which is not true, such apps are already broken, they just
> happen to work in 99% of cases). The second best solution that doesn't
> break those apps at the cost of leaving the race window wider, is this
> patch.

As you say the apps are broken, so file a bug and have them fixed.

The assumption is clearly invalid, so apps cannot make such an
assumption.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH V5 1/1] bpf: control events stored in PERF_EVENT_ARRAY maps trace data output when perf sampling

2015-10-21 Thread pi3orama



发自我的 iPhone

> 在 2015年10月21日，下午9:49，Peter Zijlstra  写道：
> 
>> On Wed, Oct 21, 2015 at 09:42:12PM +0800, Wangnan (F) wrote:
>> How can an eBPF program access a !local event:
>> 
>> when creating perf event array we don't care which perf event
>> is for which CPU, so perf program can access any perf event in
>> that array.
> 
> So what is stopping the eBPF thing from calling perf_event_read_local()
> on a !local event and triggering a kernel splat?

I can understand the perf_event_read_local() case, but I really can't understand
what is stopping us to write to an atomic field belong to a !local perf event.
Could you please give a further explanation?

Thank you.
> 

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Fw: [Bug 106241] New: shutdown(3)/close(3) behaviour is incorrect for sockets in accept(3)

2015-10-21 Thread Alan Burlison


On 21/10/2015 04:49, Al Viro wrote:

Firstly, thank you for the comprehensive and considered reply.


Refcount is an implementation detail, of course.  However, in any Unix I know
of, there are two separate notions - descriptor losing connection to opened
file (be it from close(), exit(), execve(), dup2(), etc.) and opened file
getting closed.


Yep, it's an implementation detail inside the kernel - Solaris also has 
a refcount inside its vnodes. However that's really only dimly visible 
at the process level, where all you have is an integer file ID.



The latter cannot happen while there are descriptors connected to the
file in question, of course.  However, that is not the only thing
that might prevent an opened file from getting closed - e.g. sending an
SCM_RIGHTS datagram with attached descriptor connected to the opened file
in question *at* *the* *moment* *of* *sendmsg(2)* will carry said opened
file until it is successfully received or discarded (in the former case
recepient will get a new descriptor refering to that opened file, of course).
Having the original descriptor closed right after sendmsg(2) does *not*
do anything to opened file.  On any Unix that implements descriptor-passing.


I believe async IO data is another way that a file can remain live after 
a close(), from the close() section of IEEE Std 1003.1:


"An I/O operation that is not canceled completes as if the close() 
operation had not yet occurred"



There's going to be a notion of "last close"; that's what this refcount is
about and _that_ is more than implementation detail.


Yes, POSIX distinguishes between "file descriptor" and "file 
description" (ugh!) and the close() page says:


"When all file descriptors associated with an open file description have 
been closed, the open file description shall be freed."


In the context of this discussion I believe it's the behaviour of the 
integer file descriptor that's the issue. Once it's had close() called 
on it then it's invalid, and any IO on it should fail, even if the 
underlying file description is still 'live'.



In other words, is that destruction of
* any descriptor refering to this socket [utterly insane for obvious
reasons]
* the last descriptor refering to this socket (modulo descriptor
passing, etc.) [a bitch to implement, unless we treat a syscall in progress
as keeping the opened file open], or
* _the_ descriptor used to issue accept(2) [a bitch to implement,
with a lot of fun races in an already race-prone area]?


From reading the POSIX close() page I believe the second option is the 
correct one.



Additional question is whether it's
* just a magical behaviour of close(2) [ugly], or
* something that happens when descriptor gets dissociated from
opened file [obviously more consistent]?


The second, I believe.


BTW, for real fun, consider this:
7)
// fd is a socket
fd2 = dup(fd);
in thread A: accept(fd);
in thread B: accept(fd);
in thread C: accept(fd2);
in thread D: close(fd);

Which threads (if any), should get hit where it hurts?


A & B should return from the accept with an error. C should continue. 
Which is what happens on Solaris.



I have no idea what semantics does Solaris have in that area and how racy
their descriptor table handling is.  And no, I'm not going to RTFS their
kernel, CDDL being what it is.


I can answer that for you :-) I've looked through the appropriate bits 
of the Solaris kernel code and my colleague Casper has written an 
excellent summary of what happens, so with his permission I've just 
copied it verbatim below:


--
Since at least Solaris 7 (1998), a thread which is sleeping
on a file descriptor which is being closed by another thread,
will be woken up.

To this end each thread keeps a list of file descriptors
in use by the current active system call.

When a file descriptor is closed and this file descriptor
is marked as being in use by other threads, the kernel
will search all threads to see which have this file descriptor
listed as in use. For each such thread, the kernel tells
the thread that its active fds list is now stale and, if
possible, makes the thread run.

While this algorithm is pretty expensive, it is not often invoked.

The thread running close() will NOT return until all other threads
using that filedescriptor have released it.

When run, the thread will return from its syscall and will in most cases
return EBADF. A second thread trying to close this same file descriptor
may return earlier with close() returning EBADF.
--

--
Alan Burlison
--
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 1/1] xen-netfront: update num_queues to real created

2015-10-21 Thread David Miller

From: Joe Jin 
Date: Mon, 19 Oct 2015 13:37:17 +0800

> Sometimes xennet_create_queues() may failed to created all requested
> queues, we need to update num_queues to real created to avoid NULL
> pointer dereference.
> 
> Signed-off-by: Joe Jin 

Applied.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next 0/7] RACK loss detection

2015-10-21 Thread David Miller

From: Yuchung Cheng 
Date: Fri, 16 Oct 2015 21:57:40 -0700

> RACK (Recent ACK) loss recovery uses the notion of time instead of
> packet sequence (FACK) or counts (dupthresh).
> 
> It's inspired by the FACK heuristic in tcp_mark_lost_retrans(): when a
> limited transmit (new data packet) is sacked in recovery, then any
> retransmission sent before that newly sacked packet was sent must have
> been lost, since at least one round trip time has elapsed.
> 
> But that existing heuristic from tcp_mark_lost_retrans()
> has several limitations:
>   1) it can't detect tail drops since it depends on limited transmit
>   2) it's disabled upon reordering (assumes no reordering)
>   3) it's only enabled in fast recovery but not timeout recovery
> 
> RACK addresses these limitations with a core idea: an unacknowledged
> packet P1 is deemed lost if a packet P2 that was sent later is is
> s/acked, since at least one round trip has passed.
> 
> Since RACK cares about the time sequence instead of the data sequence
> of packets, it can detect tail drops when a later retransmission is
> s/acked, while FACK or dupthresh can't. For reordering RACK uses a
> dynamically adjusted reordering window ("reo_wnd") to reduce false
> positives on ever (small) degree of reordering, similar to the delayed
> Early Retransmit.
> 
> In the current patch set RACK is only a supplemental loss detection
> and does not trigger fast recovery. However we are developing RACK
> to replace or consolidate FACK/dupthresh, early retransmit, and
> thin-dupack. These heuristics all implicitly bear the time notion.
> For example, the delayed Early Retransmit is simply applying RACK
> to trigger the fast recovery with small inflight.
> 
> RACK requires measuring the minimum RTT. Tracking a global min is less
> robust due to traffic engineering pathing changes. Therefore it uses a
> windowed filter by Kathleen Nichols. The min RTT can also be useful
> for various other purposes like congestion control or stat monitoring.
> 
> This patch has been used on Google servers for well over 1 year. RACK
> has also been implemented in the QUIC protocol. We are submitting an
> IETF draft as well.

This looks really great, in fact in my eyes the entire series is
justified merely by patch #3 :-)

Series applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH V5 1/1] bpf: control events stored in PERF_EVENT_ARRAY maps trace data output when perf sampling

2015-10-21 Thread Wangnan (F)




On 2015/10/21 20:17, Peter Zijlstra wrote:

On Wed, Oct 21, 2015 at 07:49:34PM +0800, Wangnan (F) wrote:

If our task is sampling cycle events during a function is running,
and if two cores start that function overlap:

Time:   ...A
Core 0: sys_write\
   \
\
Core 1: sys_write%return
Core 2: sys_write

Then without counter at time A it is highly possible that
BPF program on core 1 and core 2 get conflict with each other.
The final result is we make some of those events be turned on
and others turned off. Using atomic counter can avoid this
problem.

But but, how and why can an eBPF program access a !local event? I
thought we had hard restrictions on that.


How can an eBPF program access a !local event:

when creating perf event array we don't care which perf event
is for which CPU, so perf program can access any perf event in
that array. Which is straightforward. And in soft
disabling/enabling, what need to be done is an atomic
operation on something in 'perf_event' structure, which is safe
enough.

Why we need an eBPF program access a !local event:

I think I have explained why we need an eBPF program to access
a !local event. In summary, without this ability we can't
speak in user's language because they are focus on higher level
principles (display refreshing, application booting, http
processing...) and 'on which CPU' is not in their dictionaries most
of the time. Without cross-core soft-enabling/disabling it is hard
to translate requirements like "start sampling when display refreshing
begin" and "stop sampling when application booted" into eBPF programs
and perf cmdline. Don't you think it is useful for reducing sampling
data and needs to be considered?

One alternative solution I can image is to attach a BPF program
at sampling like kprobe, and return 0 if we don't want sampling
take action. Thought? Actually speaking I don't like it very much
because the principle of soft-disable is much simpler and safe, but
if you really like it I think we can try.

Do you think soft-disable/enable perf events on other cores makes
any real problem?

Thank you.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH V5 1/1] bpf: control events stored in PERF_EVENT_ARRAY maps trace data output when perf sampling

2015-10-21 Thread Peter Zijlstra

On Wed, Oct 21, 2015 at 09:42:12PM +0800, Wangnan (F) wrote:
> How can an eBPF program access a !local event:
> 
> when creating perf event array we don't care which perf event
> is for which CPU, so perf program can access any perf event in
> that array.

So what is stopping the eBPF thing from calling perf_event_read_local()
on a !local event and triggering a kernel splat?

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 00/15] Fix warnings reported by coccicheck

2015-10-21 Thread Punit Vara

Fix various warning reported by coccicheck:

make coccicheck M=drivers/net/wireless


Punit Vara (15):
  net: wireless: ath: use | instead of + for summing bitmasks
  net: wireless: ath: Remove unnecessary semicolon
  net: wireless: ath: Remove unnecessary semicolon
  net: wireless: ipw2x00: use | instead of + for summing bitmasks
  net: wireless: ti: Return flow can be simplified for
wl1271_cmd_interrogate
  net: wireless: rtwifi: Remove duplicated arguments to |
  net: wireless: brcm80211: Remove duplicated arguments to |
  net: wireless: simplify return flow for usb_msg_control
  net: wireless: simplify return flow for zd1201_setconfig16
  net: wireless: ath: simplify return flow for
carl9170_regwrite_result()
  net: wireless: iwlegacy: Remove unneeded variable ret
  net: wireless: brcm80211: Remove unneeded variable which return 0
  net: wireless: brcm80211: Remove unneeded variable ret_code returning
0
  net: wireless: ath: Remove unneeded variable ret returning 0
  net: wireless: ath: Remove unneeded variable ret returning 0

 drivers/net/wireless/at76c50x-usb.c   |  5 +
 drivers/net/wireless/ath/ath10k/htt_rx.c  |  2 +-
 drivers/net/wireless/ath/ath10k/pci.c | 10 +-
 drivers/net/wireless/ath/ath10k/wmi.h |  2 +-
 drivers/net/wireless/ath/ath5k/eeprom.c   |  4 ++--
 drivers/net/wireless/ath/carl9170/phy.c   |  7 +--
 drivers/net/wireless/ath/wcn36xx/main.c   |  3 +--
 drivers/net/wireless/brcm80211/brcmsmac/channel.c |  1 -
 drivers/net/wireless/brcm80211/brcmsmac/main.c|  3 +--
 drivers/net/wireless/brcm80211/brcmsmac/stf.c |  5 ++---
 drivers/net/wireless/ipw2x00/libipw_rx.c  |  4 ++--
 drivers/net/wireless/iwlegacy/3945-mac.c  |  5 +
 drivers/net/wireless/rtlwifi/debug.c  |  6 +++---
 drivers/net/wireless/ti/wlcore/acx.c  |  6 +-
 drivers/net/wireless/zd1201.c |  6 +-
 15 files changed, 23 insertions(+), 46 deletions(-)

--
2.5.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 07/15] net: wireless: brcm80211: Remove duplicated arguments to |

2015-10-21 Thread Punit Vara

Remove uncessary repeated arguments with OR(|)

This is patch to the brcmsmac/channel.c file that removes following
 warning reported by coccicheck:

-duplicated argument to & or |

Signed-off-by: Punit Vara 
---
 drivers/net/wireless/brcm80211/brcmsmac/channel.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/net/wireless/brcm80211/brcmsmac/channel.c 
b/drivers/net/wireless/brcm80211/brcmsmac/channel.c
index 635ae03..d56fa03 100644
--- a/drivers/net/wireless/brcm80211/brcmsmac/channel.c
+++ b/drivers/net/wireless/brcm80211/brcmsmac/channel.c
@@ -652,7 +652,6 @@ static void brcms_reg_apply_radar_flags(struct wiphy *wiphy)
 */
if (!(ch->flags & IEEE80211_CHAN_DISABLED))
ch->flags |= IEEE80211_CHAN_RADAR |
-IEEE80211_CHAN_NO_IR |
 IEEE80211_CHAN_NO_IR;
}
 }
-- 
2.5.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 05/15] net: wireless: ti: Return flow can be simplified for wl1271_cmd_interrogate

2015-10-21 Thread Punit Vara

This patch is to the wlcore/acx.c file that fixes up warning
reported by coccicheck:

WARNING: end returns can be simplified if negative or 0 value

Prefer direct return value instead of writing 2-3 more sentence.

Signed-off-by: Punit Vara 
---
 drivers/net/wireless/ti/wlcore/acx.c | 6 +-
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/drivers/net/wireless/ti/wlcore/acx.c 
b/drivers/net/wireless/ti/wlcore/acx.c
index f28fa3b..331382c 100644
--- a/drivers/net/wireless/ti/wlcore/acx.c
+++ b/drivers/net/wireless/ti/wlcore/acx.c
@@ -162,12 +162,8 @@ int wl1271_acx_mem_map(struct wl1271 *wl, struct 
acx_header *mem_map,
 
wl1271_debug(DEBUG_ACX, "acx mem map");
 
-   ret = wl1271_cmd_interrogate(wl, ACX_MEM_MAP, mem_map,
+   return wl1271_cmd_interrogate(wl, ACX_MEM_MAP, mem_map,
 sizeof(struct acx_header), len);
-   if (ret < 0)
-   return ret;
-
-   return 0;
 }
 
 int wl1271_acx_rx_msdu_life_time(struct wl1271 *wl)
-- 
2.5.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Bug 106241] New: shutdown(3)/close(3) behaviour is incorrect for sockets in accept(3)

2015-10-21 Thread David Miller

From: Alan Burlison 
Date: Wed, 21 Oct 2015 15:38:51 +0100

> While this algorithm is pretty expensive, it is not often invoked.

I bet it can be easily intentionally invoked, by a malicious entity no
less.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net] net: try harder to not reuse ifindex when moving interfaces

2015-10-21 Thread Jiri Benc

On Wed, 21 Oct 2015 08:32:14 -0700 (PDT), David Miller wrote:
> As you say the apps are broken, so file a bug and have them fixed.
> 
> The assumption is clearly invalid, so apps cannot make such an
> assumption.

Does it mean you would be okay with a patch that always allocates and
assigns a new ifindex in the target netns when interface is moved
between name spaces?

 Jiri

-- 
Jiri Benc
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net] net: try harder to not reuse ifindex when moving interfaces

2015-10-21 Thread David Miller

From: Jiri Benc 
Date: Fri, 16 Oct 2015 13:07:59 +0200

> This of course does not fix the reuse problem for the applications;
> it just makes it less likely to be hit in common usage patterns.

Not only does this not fix the problem, it makes the incentive to fix
that problem much smaller.

Therefore I am not applying this patch, sorry.

Fix the real problem, then come talk to us.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net 0/2] net: mv643xx_eth: TSO TX data corruption fixes

2015-10-21 Thread David Miller

From: Philipp Kirchhofer 
Date: Sun, 18 Oct 2015 16:02:42 +0200

> as previously discussed [1] the mv643xx_eth driver has some
> issues with data corruption when using TCP segmentation offload (TSO).
> 
> The following patch set improves this situation by fixing two data
> corruption bugs in the TSO TX path.
> 
> Before applying the patches repeatedly accessing large files located on
> a SMB share on my NSA325 NAS with TSO enabled resulted in different
> hash sums, which confirmed that data corruption is happening during
> file transfer. After applying the patches the hash sums were the same.
> 
> As this is my first patch submission please feel free to point out any
> issues with the patch set.
> 
> [1] http://thread.gmane.org/gmane.linux.network/336530

Series applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v3] vsock: fix missing cleanup when misc_register failed

2015-10-21 Thread David Miller

From: Gao feng 
Date: Sun, 18 Oct 2015 23:35:56 +0800

> reset transport and unlock if misc_register failed.
> 
> Signed-off-by: Gao feng 

Applied.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH nf-next 0/4] netfilter: rework netfilter ipv6 defrag

2015-10-21 Thread David Miller

From: Florian Westphal 
Date: Sat, 17 Oct 2015 22:14:21 +0200

> [ CC netdev since patch #2 isn't nf-specific.  Dave, if you want
>   I can resubmit that one after the next nf-pull request; let me know if
>   you would prefer that ].

No objections to you merging patch #2 however you wish, it's
perfectly fine.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net] net: try harder to not reuse ifindex when moving interfaces

2015-10-21 Thread Jiri Benc

On Wed, 21 Oct 2015 07:43:32 -0700 (PDT), David Miller wrote:
> Fix the real problem, then come talk to us.

I don't think the real problem is fixable, given that any kind of
unique non-settable identifier would break CRIU. And anything settable
will have the exact same problem. All we can do is narrowing the race
window.

For example, we could always alloc a new ifindex when moving interfaces
between name spaces. That would be probably the tiniest race window we
could get to (still not zero!) but I guess it would break apps that
assume that ifindex doesn't change when moving interfaces between name
spaces (which is not true, such apps are already broken, they just
happen to work in 99% of cases). The second best solution that doesn't
break those apps at the cost of leaving the race window wider, is this
patch.

But whatever, I don't care enough about this.

 Jiri

-- 
Jiri Benc
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH -next] net: hisilicon: Never build on SPARC

2015-10-21 Thread David Miller

From: Guenter Roeck 
Date: Wed, 21 Oct 2015 07:29:33 -0700

> The Hisilicon network driver does not build for Sparc. Enabling
> COMPILE_TEST for it causes Sparc allmodconfig/allyesconfig builds
> to fail with
> 
> drivers/net/ethernet/hisilicon/hns_mdio.c: In function 'hns_mdio_bus_name':
> drivers/net/ethernet/hisilicon/hns_mdio.c:409:3: error:
>   implicit declaration of function 'of_translate_address'
> 
> Fixes: 876133d3161d ("net: hisilicon: add OF dependency")
> Cc: Arnd Bergmann 
> Signed-off-by: Guenter Roeck 

I wish we would really resolve this properly instead of hacking crap
like this all the time, it's stupid.

SPARC simply never needs to "translate" OF addresses, since all OF
resources are fully translated already at boot time during OF tree
import.

All IRQs are fully resolved as well.

So we could simply make of_translate_address() a NOP.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next 3/4] bpf: add support for persistent maps/progs

2015-10-21 Thread Daniel Borkmann


On 10/20/2015 08:56 PM, Eric W. Biederman wrote:
...

Just FYI:  Using a device for this kind of interface is pretty
much a non-starter as that quickly gets you into situations where
things do not work in containers.  If someone gets a version of device
namespaces past GregKH it might be up for discussion to use character
devices.


Okay, you are referring to this discussion here:

  http://thread.gmane.org/gmane.linux.kernel.containers/26760

What had been mentioned earlier in this thread was to have a namespace
pass-through facility enforced by device cgroups we have in the kernel,
which is one out of various means used to enforce policy today by
deployment systems such as docker, for example. But more below.

I think this all depends on the kind of expectations we have, where all
this is going. In the original proposal, it was agreed to have the
operation that creates a node as 'capable(CAP_SYS_ADMIN)'-only (in the
way like most of the rest of eBPF is restricted), and based on the use
case we distribute such objects to unprivileged applications. But I
understand that it seems the trend lately to lift eBPF restrictions at
some point anyway, and thus the CAP_SYS_ADMIN is suddenly irrelevant
again. Fair enough.

Don't get me wrong, I really don't mind if it will be some version of
this fs patch or whatever architecture else we find consensus on, I
think this discussion is merely trying to evaluate/discuss on what seems
to be a good fit, also in terms of future requirements and integration.

So far, during this discussion, it was proposed to modify the file system
to a single-mount one and to stick this under /sys/kernel/bpf/. This
will not have "real" namespace support either, but it was proposed to
have a following structure:

  /sys/kernel/bpf/username//progX

So, the file system will have kind of a user home-directory for each user
to isolate through permissions, if I understood correctly.

If we really want to go this route, then I think there are no big stones
in the way for the other model either. It should look roughly drafted like
the below.

Together with device cgroups for containers, it would allow scenarios where
you can have:

  * eBPF (map/prog) device pass-through so a map/prog could even be shared out
from the initial namespace into individual ones/all (one could possibly
extend such maps as read-only for these consumers).
  * eBPF device creation for unprivileged users with permissions being set
accordingly (as in fs case).
  * Since cgroup controller can also do wildcards on major/minors, we could
make that further fine-grained.
  * eBPF device creation can also be enforced by the cgroup controller to be
entirely disallowed for a specific container.

(An admin can determine the dynamically created major f.e. under /proc/devices.)

FWIW, here's a drafted diff on the idea:

 (https://git.breakpoint.cc/cgit/dborkman/net-next.git/log/?h=ebpf-fds-final6)

 drivers/base/core.c  |  38 +++-
 include/linux/bpf.h  |  39 -
 include/linux/device.h   |  10 +-
 include/uapi/linux/bpf.h |  45 +
 kernel/bpf/Makefile  |   4 +-
 kernel/bpf/core.c|   3 +-
 kernel/bpf/device.c  | 441 +++
 kernel/bpf/syscall.c |  52 +-
 mm/backing-dev.c |   3 +-
 9 files changed, 567 insertions(+), 68 deletions(-)
 create mode 100644 kernel/bpf/device.c

diff --git a/drivers/base/core.c b/drivers/base/core.c
index 334ec7e..11721c8 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -1678,7 +1678,8 @@ static void device_create_release(struct device *dev)

 static struct device *
 device_create_groups_vargs(struct class *class, struct device *parent,
-  dev_t devt, void *drvdata,
+  dev_t devt, const struct device_type *type,
+  void *drvdata,
   const struct attribute_group **groups,
   const char *fmt, va_list args)
 {
@@ -1697,6 +1698,7 @@ device_create_groups_vargs(struct class *class, struct 
device *parent,
device_initialize(dev);
dev->devt = devt;
dev->class = class;
+   dev->type = type;
dev->parent = parent;
dev->groups = groups;
dev->release = device_create_release;
@@ -1743,11 +1745,11 @@ error:
  * been created with a call to class_create().
  */
 struct device *device_create_vargs(struct class *class, struct device *parent,
-  dev_t devt, void *drvdata, const char *fmt,
-  va_list args)
+  dev_t devt, const struct device_type *type,
+  void *drvdata, const char *fmt, va_list args)
 {
-   return device_create_groups_vargs(class, parent, devt, drvdata, NULL,
- fmt, args);
+   return device_create_groups_vargs(class, parent, devt, type, drvdata,
+

[PATCH -next] net: hisilicon: Never build on SPARC

2015-10-21 Thread Guenter Roeck

The Hisilicon network driver does not build for Sparc. Enabling
COMPILE_TEST for it causes Sparc allmodconfig/allyesconfig builds
to fail with

drivers/net/ethernet/hisilicon/hns_mdio.c: In function 'hns_mdio_bus_name':
drivers/net/ethernet/hisilicon/hns_mdio.c:409:3: error:
implicit declaration of function 'of_translate_address'

Fixes: 876133d3161d ("net: hisilicon: add OF dependency")
Cc: Arnd Bergmann 
Signed-off-by: Guenter Roeck 
---
 drivers/net/ethernet/hisilicon/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/hisilicon/Kconfig 
b/drivers/net/ethernet/hisilicon/Kconfig
index f250dec488fd..413935085591 100644
--- a/drivers/net/ethernet/hisilicon/Kconfig
+++ b/drivers/net/ethernet/hisilicon/Kconfig
@@ -5,7 +5,7 @@
 config NET_VENDOR_HISILICON
bool "Hisilicon devices"
default y
-   depends on OF && (ARM || ARM64 || COMPILE_TEST)
+   depends on OF && (ARM || ARM64 || COMPILE_TEST) && !SPARC
---help---
  If you have a network (Ethernet) card belonging to this class, say Y.
 
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH nf-next 0/4] netfilter: rework netfilter ipv6 defrag

2015-10-21 Thread Florian Westphal

Pablo Neira Ayuso  wrote:
> > I can then wait for that change to pop up in nf-next and just resend
> > this series (which will then undo that change).
> 
> I'd rather get things fixes for the existing code. This would also
> allow simple passing back to -stable, then we can move forward discuss
> and review your rework with sufficient time.

Joe, could you take care of this and submit a OVS fix to net tree?

(just add that call to nf_ct_frag6_consume_orig and take
 the morph change directly into the OVS callpath)

I will then resubmit all of this at some later point.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next] Adding switchdev ageing notification on port bridged

2015-10-21 Thread David Miller

From: Elad Raz 
Date: Mon, 19 Oct 2015 15:37:25 +0300

> Configure ageing time to the HW for newly bridged device
> 
> CC: Scott Feldman 
> CC: Jiri Pirko 
> Signed-off-by: Elad Raz 

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 08/15] net: wireless: simplify return flow for usb_control_msg

2015-10-21 Thread Punit Vara

This patch is to the at76c50x-usb.c file that fixes up warning
reported by coccicheck:

WARNING: end returns can be simplified if negative or 0 value

Prefer direct return value instead of writing 2-3 more sentence.

Signed-off-by: Punit Vara 
---
 drivers/net/wireless/at76c50x-usb.c | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/drivers/net/wireless/at76c50x-usb.c 
b/drivers/net/wireless/at76c50x-usb.c
index dab2513..09427f6 100644
--- a/drivers/net/wireless/at76c50x-usb.c
+++ b/drivers/net/wireless/at76c50x-usb.c
@@ -544,13 +544,10 @@ static void at76_ledtrig_tx_activity(void)
 static int at76_remap(struct usb_device *udev)
 {
int ret;
-   ret = usb_control_msg(udev, usb_sndctrlpipe(udev, 0), 0x0a,
+   return usb_control_msg(udev, usb_sndctrlpipe(udev, 0), 0x0a,
  USB_TYPE_VENDOR | USB_DIR_OUT |
  USB_RECIP_INTERFACE, 0, 0, NULL, 0,
  USB_CTRL_GET_TIMEOUT);
-   if (ret < 0)
-   return ret;
-   return 0;
 }
 
 static int at76_get_op_mode(struct usb_device *udev)
-- 
2.5.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 09/15] net: wireless: simplify return flow for zd1201_setconfig16

2015-10-21 Thread Punit Vara

This patch is to the zd1201.c file that fixes up warning
reported by coccicheck:

WARNING: end returns can be simplified and declaration on line 1658 can
be dropped

Prefer direct return value instead of writing 2-3 more sentence.

Signed-off-by: Punit Vara 
---
 drivers/net/wireless/zd1201.c | 6 +-
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/drivers/net/wireless/zd1201.c b/drivers/net/wireless/zd1201.c
index 6f5c793..d9e67d9 100644
--- a/drivers/net/wireless/zd1201.c
+++ b/drivers/net/wireless/zd1201.c
@@ -1655,15 +1655,11 @@ static int zd1201_set_maxassoc(struct net_device *dev,
 struct iw_request_info *info, struct iw_param *rrq, char *extra)
 {
struct zd1201 *zd = netdev_priv(dev);
-   int err;
 
if (!zd->ap)
return -EOPNOTSUPP;
 
-   err = zd1201_setconfig16(zd, ZD1201_RID_CNFMAXASSOCSTATIONS, 
rrq->value);
-   if (err)
-   return err;
-   return 0;
+   return zd1201_setconfig16(zd, ZD1201_RID_CNFMAXASSOCSTATIONS, 
rrq->value);
 }
 
 static int zd1201_get_maxassoc(struct net_device *dev,
-- 
2.5.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 01/15] net: wireless: ath: use | instead of + for summing bitmasks

2015-10-21 Thread Jiri Slaby

On 10/21/2015, 04:55 PM, Punit Vara wrote:
> This patch is to the ath10k/pci.h file that fixes following warning
>  reported by coccicheck:
> 
> WARNING: sum of probable bitmasks, consider |
> 
> I have replaced + with OR operator | for summing bitmasks
> 
> Signed-off-by: Punit Vara 
> ---
>  drivers/net/wireless/ath/ath10k/pci.c | 10 +-
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/net/wireless/ath/ath10k/pci.c 
> b/drivers/net/wireless/ath/ath10k/pci.c
> index 1046ab6..165a318 100644
> --- a/drivers/net/wireless/ath/ath10k/pci.c
> +++ b/drivers/net/wireless/ath/ath10k/pci.c
> @@ -775,7 +775,7 @@ static u32 ath10k_pci_targ_cpu_to_ce_addr(struct ath10k 
> *ar, u32 addr)
>   switch (ar->hw_rev) {
>   case ATH10K_HW_QCA988X:
>   case ATH10K_HW_QCA6174:
> - val = (ath10k_pci_read32(ar, SOC_CORE_BASE_ADDRESS +
> + val = (ath10k_pci_read32(ar, SOC_CORE_BASE_ADDRESS |
> CORE_CTRL_ADDRESS) &

Could you explain where exactly are 2 bitmasks here?

thanks,
-- 
js
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 06/15] net: wireless: rtwifi: Remove duplicated arguments to |

2015-10-21 Thread Punit Vara

Remove uncessary repeated arguments COMP_EFUSE, COMP_REGD, COMP_CHAN
 with OR(|)

This is patch to the debug.c file that removes following warning
reported by coccicheck:

-duplicated argument to & or |

Signed-off-by: Punit Vara 
---
 drivers/net/wireless/rtlwifi/debug.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/wireless/rtlwifi/debug.c 
b/drivers/net/wireless/rtlwifi/debug.c
index fd25aba..b8f5540 100644
--- a/drivers/net/wireless/rtlwifi/debug.c
+++ b/drivers/net/wireless/rtlwifi/debug.c
@@ -37,9 +37,9 @@ void rtl_dbgp_flag_init(struct ieee80211_hw *hw)
COMP_BEACON | COMP_RATE | COMP_RXDESC | COMP_DIG | COMP_TXAGC |
COMP_POWER | COMP_POWER_TRACKING | COMP_BB_POWERSAVING | COMP_SWAS |
COMP_RF | COMP_TURBO | COMP_RATR | COMP_CMD |
-   COMP_EFUSE | COMP_QOS | COMP_MAC80211 | COMP_REGD | COMP_CHAN |
-   COMP_EASY_CONCURRENT | COMP_EFUSE | COMP_QOS | COMP_MAC80211 |
-   COMP_REGD | COMP_CHAN | COMP_BT_COEXIST;
+   COMP_EFUSE | COMP_QOS | COMP_MAC80211 | COMP_CHAN |
+   COMP_EASY_CONCURRENT | COMP_QOS | COMP_MAC80211 |
+   COMP_REGD | COMP_BT_COEXIST;
 
 
for (i = 0; i < DBGP_TYPE_MAX; i++)
-- 
2.5.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 12/15] net: wireless: brcm80211: Remove unneeded variable which return 0

2015-10-21 Thread Punit Vara

This is patch to the brcmsmac/main.c that removes unnecessary variable
which was declared to return zero.

This patch fixes up warning reported by coccicheck:
-Unneeded variable: "err". Return "0" on line 3788

Signed-off-by: Punit Vara 
---
 drivers/net/wireless/brcm80211/brcmsmac/main.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/net/wireless/brcm80211/brcmsmac/main.c 
b/drivers/net/wireless/brcm80211/brcmsmac/main.c
index 9728be0..9d717b6 100644
--- a/drivers/net/wireless/brcm80211/brcmsmac/main.c
+++ b/drivers/net/wireless/brcm80211/brcmsmac/main.c
@@ -3777,7 +3777,6 @@ static void brcms_c_set_ps_ctrl(struct brcms_c_info *wlc)
  */
 static int brcms_c_set_mac(struct brcms_bss_cfg *bsscfg)
 {
-   int err = 0;
struct brcms_c_info *wlc = bsscfg->wlc;
 
/* enter the MAC addr into the RXE match registers */
@@ -3785,7 +3784,7 @@ static int brcms_c_set_mac(struct brcms_bss_cfg *bsscfg)
 
brcms_c_ampdu_macaddr_upd(wlc);
 
-   return err;
+   return 0;
 }
 
 /* Write the BSS config's BSSID address to core (set_bssid in d11procs.tcl).
-- 
2.5.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 11/15] net: wireless: iwlegacy: Remove unneeded variable ret

2015-10-21 Thread Punit Vara

This patch is to the 3945-mac.c file that fixes up following warning
by coccicheck:

drivers/net/wireless/iwlegacy/3945-mac.c:247:5-8: Unneeded variable:
"ret". Return "- EOPNOTSUPP" on line 249

Return -EOPNOTSUPP directly instead of return using ret

Signed-off-by: Punit Vara 
---
 drivers/net/wireless/iwlegacy/3945-mac.c | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/drivers/net/wireless/iwlegacy/3945-mac.c 
b/drivers/net/wireless/iwlegacy/3945-mac.c
index af1b3e6..ff4dc44 100644
--- a/drivers/net/wireless/iwlegacy/3945-mac.c
+++ b/drivers/net/wireless/iwlegacy/3945-mac.c
@@ -244,9 +244,7 @@ il3945_set_dynamic_key(struct il_priv *il, struct 
ieee80211_key_conf *keyconf,
 static int
 il3945_remove_static_key(struct il_priv *il)
 {
-   int ret = -EOPNOTSUPP;
-
-   return ret;
+   return -EOPNOTSUPP;
 }
 
 static int
@@ -529,7 +527,6 @@ il3945_tx_skb(struct il_priv *il,
if (unlikely(tid >= MAX_TID_COUNT))
goto drop;
}
-
/* Descriptor for chosen Tx queue */
txq = >txq[txq_id];
q = >q;
-- 
2.5.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 08/15] net: wireless: simplify return flow for usb_control_msg

2015-10-21 Thread Jiri Slaby

On 10/21/2015, 04:55 PM, Punit Vara wrote:
> @@ -544,13 +544,10 @@ static void at76_ledtrig_tx_activity(void)
>  static int at76_remap(struct usb_device *udev)
>  {
>   int ret;
> - ret = usb_control_msg(udev, usb_sndctrlpipe(udev, 0), 0x0a,
> + return usb_control_msg(udev, usb_sndctrlpipe(udev, 0), 0x0a,
> USB_TYPE_VENDOR | USB_DIR_OUT |
> USB_RECIP_INTERFACE, 0, 0, NULL, 0,
> USB_CTRL_GET_TIMEOUT);
> - if (ret < 0)
> - return ret;
> - return 0;

ret is now unused, right?

-- 
js
suse labs
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH -next] net: hisilicon: Never build on SPARC

2015-10-21 Thread Guenter Roeck


Hi Arnd,

On 10/21/2015 07:39 AM, Arnd Bergmann wrote:

On Wednesday 21 October 2015 07:29:33 Guenter Roeck wrote:

The Hisilicon network driver does not build for Sparc. Enabling
COMPILE_TEST for it causes Sparc allmodconfig/allyesconfig builds
to fail with

drivers/net/ethernet/hisilicon/hns_mdio.c: In function 'hns_mdio_bus_name':
drivers/net/ethernet/hisilicon/hns_mdio.c:409:3: error:
 implicit declaration of function 'of_translate_address'


I see.


Fixes: 876133d3161d ("net: hisilicon: add OF dependency")
Cc: Arnd Bergmann 
Signed-off-by: Guenter Roeck 
---
  drivers/net/ethernet/hisilicon/Kconfig | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/hisilicon/Kconfig 
b/drivers/net/ethernet/hisilicon/Kconfig
index f250dec488fd..413935085591 100644
--- a/drivers/net/ethernet/hisilicon/Kconfig
+++ b/drivers/net/ethernet/hisilicon/Kconfig
@@ -5,7 +5,7 @@
  config NET_VENDOR_HISILICON
 bool "Hisilicon devices"
 default y
-   depends on OF && (ARM || ARM64 || COMPILE_TEST)
+   depends on OF && (ARM || ARM64 || COMPILE_TEST) && !SPARC
 ---help---
   If you have a network (Ethernet) card belonging to this class, say Y.


This looks fragile to me. Checking the declaration of of_translate_address,
I see now that it actually depends on CONFIG_OF_ADDRESS, which is defined using
"depends on !SPARC && HAS_IOMEM". This means we would get the same problem on
SCORE, Tile, and UML.

How about this version?

diff --git a/include/linux/of_address.h b/include/linux/of_address.h
index d88e81be6368..f2f7986cac45 100644
--- a/include/linux/of_address.h
+++ b/include/linux/of_address.h
@@ -57,6 +57,11 @@ extern int of_dma_get_range(struct device_node *np, u64 
*dma_addr,
u64 *paddr, u64 *size);
  extern bool of_dma_is_coherent(struct device_node *np);
  #else /* CONFIG_OF_ADDRESS */
+static inline u64 of_translate_address(struct device_node *np, const __be32 
*addr)
+{
+   return 0;


Maybe return OF_BAD_ADDR ?


+}
+
  static inline struct device_node *of_find_matching_node_by_address(
struct device_node *from,
const struct of_device_id *matches,


It looks like it's in line with the other wrappers here. Alternatively,
we could decide to use CONFIG_OF_ADDRESS instead of CONFIG_OF as the dependency.


You are right, both of those would be better than my patch.
My preference would be to introduce the dummy function. This would solve
the problem for good (it isn't the first time this happens).

Are you going to submit that patch ?

Thanks,
Guenter

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 01/15] net: wireless: ath: use | instead of + for summing bitmasks

2015-10-21 Thread Punit Vara

This patch is to the ath10k/pci.h file that fixes following warning
 reported by coccicheck:

WARNING: sum of probable bitmasks, consider |

I have replaced + with OR operator | for summing bitmasks

Signed-off-by: Punit Vara 
---
 drivers/net/wireless/ath/ath10k/pci.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/net/wireless/ath/ath10k/pci.c 
b/drivers/net/wireless/ath/ath10k/pci.c
index 1046ab6..165a318 100644
--- a/drivers/net/wireless/ath/ath10k/pci.c
+++ b/drivers/net/wireless/ath/ath10k/pci.c
@@ -775,7 +775,7 @@ static u32 ath10k_pci_targ_cpu_to_ce_addr(struct ath10k 
*ar, u32 addr)
switch (ar->hw_rev) {
case ATH10K_HW_QCA988X:
case ATH10K_HW_QCA6174:
-   val = (ath10k_pci_read32(ar, SOC_CORE_BASE_ADDRESS +
+   val = (ath10k_pci_read32(ar, SOC_CORE_BASE_ADDRESS |
  CORE_CTRL_ADDRESS) &
   0x7ff) << 21;
break;
@@ -1443,10 +1443,10 @@ static void ath10k_pci_irq_msi_fw_mask(struct ath10k 
*ar)
switch (ar->hw_rev) {
case ATH10K_HW_QCA988X:
case ATH10K_HW_QCA6174:
-   val = ath10k_pci_read32(ar, SOC_CORE_BASE_ADDRESS +
+   val = ath10k_pci_read32(ar, SOC_CORE_BASE_ADDRESS |
CORE_CTRL_ADDRESS);
val &= ~CORE_CTRL_PCIE_REG_31_MASK;
-   ath10k_pci_write32(ar, SOC_CORE_BASE_ADDRESS +
+   ath10k_pci_write32(ar, SOC_CORE_BASE_ADDRESS |
   CORE_CTRL_ADDRESS, val);
break;
case ATH10K_HW_QCA99X0:
@@ -1464,10 +1464,10 @@ static void ath10k_pci_irq_msi_fw_unmask(struct ath10k 
*ar)
switch (ar->hw_rev) {
case ATH10K_HW_QCA988X:
case ATH10K_HW_QCA6174:
-   val = ath10k_pci_read32(ar, SOC_CORE_BASE_ADDRESS +
+   val = ath10k_pci_read32(ar, SOC_CORE_BASE_ADDRESS |
CORE_CTRL_ADDRESS);
val |= CORE_CTRL_PCIE_REG_31_MASK;
-   ath10k_pci_write32(ar, SOC_CORE_BASE_ADDRESS +
+   ath10k_pci_write32(ar, SOC_CORE_BASE_ADDRESS |
   CORE_CTRL_ADDRESS, val);
break;
case ATH10K_HW_QCA99X0:
-- 
2.5.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 02/15] net: wireless: ath: Remove unnecessary semicolon

2015-10-21 Thread Punit Vara

This patch is to the htt_rx.c that removes unneeded semicolon which is
reported by coccicheck.

Here semicolon just create empty statement so please remote it.

Signed-off-by: Punit Vara 
---
 drivers/net/wireless/ath/ath10k/htt_rx.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/wireless/ath/ath10k/htt_rx.c 
b/drivers/net/wireless/ath/ath10k/htt_rx.c
index 1b7a043..002a633 100644
--- a/drivers/net/wireless/ath/ath10k/htt_rx.c
+++ b/drivers/net/wireless/ath/ath10k/htt_rx.c
@@ -2077,7 +2077,7 @@ void ath10k_htt_t2h_msg_handler(struct ath10k *ar, struct 
sk_buff *skb)
ath10k_dbg_dump(ar, ATH10K_DBG_HTT_DUMP, NULL, "htt event: ",
skb->data, skb->len);
break;
-   };
+   }
 
/* Free the indication buffer */
dev_kfree_skb_any(skb);
-- 
2.5.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 04/15] net: wireless: ipw2x00: use | instead of + for summing bitmasks

2015-10-21 Thread Punit Vara

This patch is to the libipw_rx.c file that fixes following warning
reported by coccicheck:

WARNING: sum of probable bitmasks, consider |

I have replaced + with OR operator | for summing bitmasks

Signed-off-by: Punit Vara 
---
 drivers/net/wireless/ipw2x00/libipw_rx.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/wireless/ipw2x00/libipw_rx.c 
b/drivers/net/wireless/ipw2x00/libipw_rx.c
index cef7f7d..310b2ff 100644
--- a/drivers/net/wireless/ipw2x00/libipw_rx.c
+++ b/drivers/net/wireless/ipw2x00/libipw_rx.c
@@ -875,7 +875,7 @@ void libipw_rx_any(struct libipw_device *ieee,
case IW_MODE_ADHOC:
/* our BSS and not from/to DS */
if (ether_addr_equal(hdr->addr3, ieee->bssid))
-   if ((fc & (IEEE80211_FCTL_TODS+IEEE80211_FCTL_FROMDS)) == 0) {
+   if ((fc & (IEEE80211_FCTL_TODS | IEEE80211_FCTL_FROMDS)) == 0) {
/* promisc: get all */
if (ieee->dev->flags & IFF_PROMISC)
is_packet_for_us = 1;
@@ -890,7 +890,7 @@ void libipw_rx_any(struct libipw_device *ieee,
case IW_MODE_INFRA:
/* our BSS (== from our AP) and from DS */
if (ether_addr_equal(hdr->addr2, ieee->bssid))
-   if ((fc & (IEEE80211_FCTL_TODS+IEEE80211_FCTL_FROMDS)) == 
IEEE80211_FCTL_FROMDS) {
+   if ((fc & (IEEE80211_FCTL_TODS | IEEE80211_FCTL_FROMDS)) == 
IEEE80211_FCTL_FROMDS) {
/* promisc: get all */
if (ieee->dev->flags & IFF_PROMISC)
is_packet_for_us = 1;
-- 
2.5.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 03/15] net: wireless: ath: Remove unnecessary semicolon

2015-10-21 Thread Punit Vara

This patch is to the ath10k/wmi.h that removes unneeded semicolon which
 is reported by coccicheck.

Here semicolon just create empty statement so please remote it.

Signed-off-by: Punit Vara 
---
 drivers/net/wireless/ath/ath10k/wmi.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/wireless/ath/ath10k/wmi.h 
b/drivers/net/wireless/ath/ath10k/wmi.h
index 52d3503..21d5b6b 100644
--- a/drivers/net/wireless/ath/ath10k/wmi.h
+++ b/drivers/net/wireless/ath/ath10k/wmi.h
@@ -1675,7 +1675,7 @@ static inline const char *ath10k_wmi_phymode_str(enum 
wmi_phy_mode mode)
 
/* no default handler to allow compiler to check that the
 * enum is fully handled */
-   };
+   }
 
return "";
 }
-- 
2.5.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

pull-request: mac80211-next 2015-10-21

2015-10-21 Thread Johannes Berg

Hi,

Here's another, likely final, pull request for -next. I finally caved
in and cleaned up the regulatory code a bit, which is the bulk of the
changes.

There's a new Kconfig which allows turning off CRDA, but it's hidden
behind having the internal regdb enabled, which in turn is hidden
behind EXPERT, so hopefully you won't even see that :) (although I am
hoping to get rid of the internal regdb sooner rather than later)

Let me know if you see any problems.

Thanks,
johannes




The following changes since commit 6623c60dc28ee966cd85c6f12aa2fc3c952d0179:

  bridge: vlan: enforce no pvid flag in vlan ranges (2015-10-12 19:59:15 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next.git 
tags/mac80211-next-for-davem-2015-10-21

for you to fetch changes up to e5a9f8d04660da7ef3a98260aa74c3976f9cb4cd:

  mac80211: move station statistics into sub-structs (2015-10-21 10:08:22 +0200)


Here's another set of patches for the current cycle:
 * I merged net-next back to avoid a conflict with the
 * cfg80211 scheduled scan API extensions
 * preparations for better scan result timestamping
 * regulatory cleanups
 * mac80211 statistics cleanups
 * a few other small cleanups and fixes


Avraham Stern (2):
  cfg80211: Add multiple scan plans for scheduled scan
  mac80211: Do not restart scheduled scan if multiple scan plans are set

Dmitry Shmidt (1):
  nl80211: allow BSS data to include CLOCK_BOOTTIME timestamp

Felix Fietkau (1):
  mac80211: add missing struct ieee80211_txq tid field initialization

Johannes Berg (25):
  Merge remote-tracking branch 'net-next/master' into mac80211-next
  wireless: update robust action frame list
  wireless: add WNM action frame categories
  mac80211: use new cfg80211_inform_bss_frame_data() API
  mac80211: remove PM-QoS listener
  mac80211: clean up ieee80211_rx_h_check_dup code
  mac80211: move sta_set_rate_info_rx() and make it static
  mac80211: remove cfg.h
  mac80211: remove event.c
  cfg80211: fix gHz to GHz
  cfg80211: reg: remove useless non-NULL check
  cfg80211: reg: fix reg_call_crda() return value bug
  cfg80211: reg: rename reg_call_crda to reg_query_database
  cfg80211: reg: search built-in database directly
  cfg80211: reg: remove useless reg_timeout scheduling
  cfg80211: reg: make CRDA support optional
  cfg80211: reg: rename reg_regdb_query() to reg_query_builtin()
  cfg80211: reg: clarify 'treatment' handling in reg_process_hint()
  cfg80211: reg: centralize freeing ignored requests
  cfg80211: reg: fix antenna gain in chan_reg_rule_print_dbg()
  cfg80211: reg: reduce chan_reg_rule_print_dbg() ifdef
  cfg80211: reg: fix reg_ignore_cell_hint return type
  mac80211: remove sta->last_ack_signal
  mac80211: move beacon_loss_count into ifmgd
  mac80211: move station statistics into sub-structs

Tamizh chelvam (1):
  Revert "mac80211: remove exposing 'mfp' to drivers"

 Documentation/DocBook/80211.tmpl   |   5 +-
 drivers/net/wireless/ath/ath6kl/cfg80211.c |   2 +-
 drivers/net/wireless/iwlwifi/mvm/ops.c |   1 +
 drivers/net/wireless/iwlwifi/mvm/scan.c|   4 +-
 drivers/net/wireless/rt2x00/rt2x00config.c |   2 +-
 drivers/net/wireless/ti/wl12xx/scan.c  |   3 +-
 drivers/net/wireless/ti/wl18xx/scan.c  |   8 +-
 include/linux/ieee80211.h  |   5 +
 include/net/cfg80211.h | 126 ++---
 include/net/mac80211.h |   8 +-
 include/uapi/linux/nl80211.h   |  57 +-
 net/mac80211/Makefile  |   1 -
 net/mac80211/cfg.c |  45 +
 net/mac80211/cfg.h |   9 -
 net/mac80211/debugfs_sta.c |   8 +-
 net/mac80211/ethtool.c |  29 ++-
 net/mac80211/event.c   |  27 ---
 net/mac80211/ibss.c|  24 +--
 net/mac80211/ieee80211_i.h |  13 +-
 net/mac80211/iface.c   |   4 +-
 net/mac80211/main.c|  14 --
 net/mac80211/mesh_hwmp.c   |   2 +-
 net/mac80211/mesh_plink.c  |   6 +-
 net/mac80211/mlme.c|  85 +++--
 net/mac80211/ocb.c |   2 +-
 net/mac80211/rx.c  |  81 
 net/mac80211/scan.c|  20 +-
 net/mac80211/sta_info.c| 101 +++---
 net/mac80211/sta_info.h| 103 --
 net/mac80211/status.c  |  53 +++---
 net/mac80211/trace.h   |   2 -
 net/mac80211/tx.c  |  20 +-
 net/mac80211/util.c|  12 +-
 net/mac80211/wpa.c

Re: [PATCH V5 1/1] bpf: control events stored in PERF_EVENT_ARRAY maps trace data output when perf sampling

2015-10-21 Thread pi3orama

发自我的 iPhone

> 在 2015年10月21日，下午10:09，Peter Zijlstra  写道：
> 
> On Wed, Oct 21, 2015 at 10:01:46PM +0800, pi3orama wrote:
>>> 在 2015年10月21日，下午9:49，Peter Zijlstra  写道：
>>> 
 On Wed, Oct 21, 2015 at 09:42:12PM +0800, Wangnan (F) wrote:
 How can an eBPF program access a !local event:

 when creating perf event array we don't care which perf event
 is for which CPU, so perf program can access any perf event in
 that array.
>>> 
>>> So what is stopping the eBPF thing from calling perf_event_read_local()
>>> on a !local event and triggering a kernel splat?
>> 
>> I can understand the perf_event_read_local() case, but I really can't 
>> understand
>> what is stopping us to write to an atomic field belong to a !local perf 
>> event.
>> Could you please give a further explanation?
> 
> I simply do not get how this eBPF stuff works.
> 
> Either I have access to !local events and I can hand one to
> perf_event_read_local() and cause badness, or I do not have access to
> !local events and the whole 'soft enable/disable' thing is simple.
> 
> They cannot be both true.
> 
> So explain; how does this eBPF stuff work.

I think I get your point this time, and let me explain the eBPF stuff to you.

You are aware that BPF programmer can break the system in this way:

A=get_non_local_perf_event()
perf_event_read_local(A)
BOOM!

However the above logic is impossible because BPF program can't work this
way.

First of all, it is impossible for a BPF program directly invoke a kernel 
function. 
Doesn't like kernel module, BPF program can only invoke functions designed for 
them, like what this patch does. So the ability of BPF programs is strictly
restricted by kernel. If we don't allow BPF program call perf_event_read_local()
across core, we can check this and return error in function we provide for them.

Second: there's no way for a BPF program directly access a perf event. All perf
events have to be wrapped by a map and be accessed by BPF functions described
above. We don't allow BPF program fetch array element from that map. So 
pointers of perf event is safely protected from BPF program.

In summary, your either-or logic doesn't hold in BPF world. A BPF program can
only access perf event in a highly restricted way. We don't allow it calling
perf_event_read_local() across core, so it can't.

Thank you.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/8] mm: page_counter: let page_counter_try_charge() return bool

2015-10-21 Thread Johannes Weiner

page_counter_try_charge() currently returns 0 on success and -ENOMEM
on failure, which is surprising behavior given the function name.

Make it follow the expected pattern of try_stuff() functions that
return a boolean true to indicate success, or false for failure.

Signed-off-by: Johannes Weiner 
---
 include/linux/page_counter.h |  6 +++---
 mm/hugetlb_cgroup.c  |  3 ++-
 mm/memcontrol.c  | 11 +--
 mm/page_counter.c| 14 +++---
 4 files changed, 17 insertions(+), 17 deletions(-)

diff --git a/include/linux/page_counter.h b/include/linux/page_counter.h
index 17fa4f8..7e62920 100644
--- a/include/linux/page_counter.h
+++ b/include/linux/page_counter.h
@@ -36,9 +36,9 @@ static inline unsigned long page_counter_read(struct 
page_counter *counter)
 
 void page_counter_cancel(struct page_counter *counter, unsigned long nr_pages);
 void page_counter_charge(struct page_counter *counter, unsigned long nr_pages);
-int page_counter_try_charge(struct page_counter *counter,
-   unsigned long nr_pages,
-   struct page_counter **fail);
+bool page_counter_try_charge(struct page_counter *counter,
+unsigned long nr_pages,
+struct page_counter **fail);
 void page_counter_uncharge(struct page_counter *counter, unsigned long 
nr_pages);
 int page_counter_limit(struct page_counter *counter, unsigned long limit);
 int page_counter_memparse(const char *buf, const char *max,
diff --git a/mm/hugetlb_cgroup.c b/mm/hugetlb_cgroup.c
index 6a44263..d8fb10d 100644
--- a/mm/hugetlb_cgroup.c
+++ b/mm/hugetlb_cgroup.c
@@ -186,7 +186,8 @@ again:
}
rcu_read_unlock();
 
-   ret = page_counter_try_charge(_cg->hugepage[idx], nr_pages, );
+   if (!page_counter_try_charge(_cg->hugepage[idx], nr_pages, ))
+   ret = -ENOMEM;
css_put(_cg->css);
 done:
*ptr = h_cg;
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index c71fe40..a8ccdbc 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2018,8 +2018,8 @@ retry:
return 0;
 
if (!do_swap_account ||
-   !page_counter_try_charge(>memsw, batch, )) {
-   if (!page_counter_try_charge(>memory, batch, ))
+   page_counter_try_charge(>memsw, batch, )) {
+   if (page_counter_try_charge(>memory, batch, ))
goto done_restock;
if (do_swap_account)
page_counter_uncharge(>memsw, batch);
@@ -2383,14 +2383,13 @@ int __memcg_kmem_charge_memcg(struct page *page, gfp_t 
gfp, int order,
 {
unsigned int nr_pages = 1 << order;
struct page_counter *counter;
-   int ret = 0;
+   int ret;
 
if (!memcg_kmem_is_active(memcg))
return 0;
 
-   ret = page_counter_try_charge(>kmem, nr_pages, );
-   if (ret)
-   return ret;
+   if (!page_counter_try_charge(>kmem, nr_pages, ))
+   return -ENOMEM;
 
ret = try_charge(memcg, gfp, nr_pages);
if (ret) {
diff --git a/mm/page_counter.c b/mm/page_counter.c
index 11b4bed..7c6a63d 100644
--- a/mm/page_counter.c
+++ b/mm/page_counter.c
@@ -56,12 +56,12 @@ void page_counter_charge(struct page_counter *counter, 
unsigned long nr_pages)
  * @nr_pages: number of pages to charge
  * @fail: points first counter to hit its limit, if any
  *
- * Returns 0 on success, or -ENOMEM and @fail if the counter or one of
- * its ancestors has hit its configured limit.
+ * Returns %true on success, or %false and @fail if the counter or one
+ * of its ancestors has hit its configured limit.
  */
-int page_counter_try_charge(struct page_counter *counter,
-   unsigned long nr_pages,
-   struct page_counter **fail)
+bool page_counter_try_charge(struct page_counter *counter,
+unsigned long nr_pages,
+struct page_counter **fail)
 {
struct page_counter *c;
 
@@ -99,13 +99,13 @@ int page_counter_try_charge(struct page_counter *counter,
if (new > c->watermark)
c->watermark = new;
}
-   return 0;
+   return true;
 
 failed:
for (c = counter; c != *fail; c = c->parent)
page_counter_cancel(c, nr_pages);
 
-   return -ENOMEM;
+   return false;
 }
 
 /**
-- 
2.6.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 3/8] net: consolidate memcg socket buffer tracking and accounting

2015-10-21 Thread Johannes Weiner

The tcp memory controller has extensive provisions for future memory
accounting interfaces that won't materialize after all. Cut the code
base down to what's actually used, now and in the likely future.

- There won't be any different protocol counters in the future, so a
  direct sock->sk_memcg linkage is enough. This eliminates a lot of
  callback maze and boilerplate code, and restores most of the socket
  allocation code to pre-tcp_memcontrol state.

- There won't be a tcp control soft limit, so integrating the memcg
  code into the global skmem limiting scheme complicates things
  unnecessarily. Replace all that with simple and clear charge and
  uncharge calls--hidden behind a jump label--to account skb memory.

- The previous jump label code was an elaborate state machine that
  tracked the number of cgroups with an active socket limit in order
  to enable the skmem tracking and accounting code only when actively
  necessary. But this is overengineered: it was meant to protect the
  people who never use this feature in the first place. Simply enable
  the branches once when the first limit is set until the next reboot.

Signed-off-by: Johannes Weiner 
---
 include/linux/memcontrol.h   |  64 ---
 include/net/sock.h   | 135 +++
 include/net/tcp.h|   3 -
 include/net/tcp_memcontrol.h |   7 ---
 mm/memcontrol.c  | 101 +++--
 net/core/sock.c  |  78 ++-
 net/ipv4/sysctl_net_ipv4.c   |   1 -
 net/ipv4/tcp.c   |   3 +-
 net/ipv4/tcp_ipv4.c  |   9 +--
 net/ipv4/tcp_memcontrol.c| 147 +++
 net/ipv4/tcp_output.c|   6 +-
 net/ipv6/tcp_ipv6.c  |   3 -
 12 files changed, 136 insertions(+), 421 deletions(-)
 delete mode 100644 include/net/tcp_memcontrol.h

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 19ff87b..5b72f83 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -85,34 +85,6 @@ enum mem_cgroup_events_target {
MEM_CGROUP_NTARGETS,
 };
 
-/*
- * Bits in struct cg_proto.flags
- */
-enum cg_proto_flags {
-   /* Currently active and new sockets should be assigned to cgroups */
-   MEMCG_SOCK_ACTIVE,
-   /* It was ever activated; we must disarm static keys on destruction */
-   MEMCG_SOCK_ACTIVATED,
-};
-
-struct cg_proto {
-   struct page_counter memory_allocated;   /* Current allocated 
memory. */
-   struct percpu_counter   sockets_allocated;  /* Current number of 
sockets. */
-   int memory_pressure;
-   longsysctl_mem[3];
-   unsigned long   flags;
-   /*
-* memcg field is used to find which memcg we belong directly
-* Each memcg struct can hold more than one cg_proto, so container_of
-* won't really cut.
-*
-* The elegant solution would be having an inverse function to
-* proto_cgroup in struct proto, but that means polluting the structure
-* for everybody, instead of just for memcg users.
-*/
-   struct mem_cgroup   *memcg;
-};
-
 #ifdef CONFIG_MEMCG
 struct mem_cgroup_stat_cpu {
long count[MEM_CGROUP_STAT_NSTATS];
@@ -185,8 +157,15 @@ struct mem_cgroup {
 
/* Accounted resources */
struct page_counter memory;
+
+   /*
+* Legacy non-resource counters. In unified hierarchy, all
+* memory is accounted and limited through memcg->memory.
+* Consumer breakdown happens in the statistics.
+*/
struct page_counter memsw;
struct page_counter kmem;
+   struct page_counter skmem;
 
/* Normal memory consumption range */
unsigned long low;
@@ -246,9 +225,6 @@ struct mem_cgroup {
 */
struct mem_cgroup_stat_cpu __percpu *stat;
 
-#if defined(CONFIG_MEMCG_KMEM) && defined(CONFIG_INET)
-   struct cg_proto tcp_mem;
-#endif
 #if defined(CONFIG_MEMCG_KMEM)
 /* Index in the kmem_cache->memcg_params.memcg_caches array */
int kmemcg_id;
@@ -676,12 +652,6 @@ void mem_cgroup_count_vm_event(struct mm_struct *mm, enum 
vm_event_item idx)
 }
 #endif /* CONFIG_MEMCG */
 
-enum {
-   UNDER_LIMIT,
-   SOFT_LIMIT,
-   OVER_LIMIT,
-};
-
 #ifdef CONFIG_CGROUP_WRITEBACK
 
 struct list_head *mem_cgroup_cgwb_list(struct mem_cgroup *memcg);
@@ -707,15 +677,35 @@ static inline void mem_cgroup_wb_stats(struct 
bdi_writeback *wb,
 
 struct sock;
 #if defined(CONFIG_INET) && defined(CONFIG_MEMCG_KMEM)
+extern struct static_key_false mem_cgroup_sockets;
+static inline bool mem_cgroup_do_sockets(void)
+{
+   return static_branch_unlikely(_cgroup_sockets);
+}
 void sock_update_memcg(struct sock *sk);
 void sock_release_memcg(struct sock *sk);
+bool mem_cgroup_charge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages);
+void

[PATCH 2/8] mm: memcontrol: export root_mem_cgroup

2015-10-21 Thread Johannes Weiner

A later patch will need this symbol in files other than memcontrol.c,
so export it now and replace mem_cgroup_root_css at the same time.

Signed-off-by: Johannes Weiner 
---
 include/linux/memcontrol.h | 3 ++-
 mm/backing-dev.c   | 2 +-
 mm/memcontrol.c| 5 ++---
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 805da1f..19ff87b 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -275,7 +275,8 @@ struct mem_cgroup {
struct mem_cgroup_per_node *nodeinfo[0];
/* WARNING: nodeinfo must be the last member here */
 };
-extern struct cgroup_subsys_state *mem_cgroup_root_css;
+
+extern struct mem_cgroup *root_mem_cgroup;
 
 /**
  * mem_cgroup_events - count memory events against a cgroup
diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index 095b23b..73ab967 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -702,7 +702,7 @@ static int cgwb_bdi_init(struct backing_dev_info *bdi)
 
ret = wb_init(>wb, bdi, 1, GFP_KERNEL);
if (!ret) {
-   bdi->wb.memcg_css = mem_cgroup_root_css;
+   bdi->wb.memcg_css = _mem_cgroup->css;
bdi->wb.blkcg_css = blkcg_root_css;
}
return ret;
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index a8ccdbc..e54f434 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -76,9 +76,9 @@
 struct cgroup_subsys memory_cgrp_subsys __read_mostly;
 EXPORT_SYMBOL(memory_cgrp_subsys);
 
+struct mem_cgroup *root_mem_cgroup __read_mostly;
+
 #define MEM_CGROUP_RECLAIM_RETRIES 5
-static struct mem_cgroup *root_mem_cgroup __read_mostly;
-struct cgroup_subsys_state *mem_cgroup_root_css __read_mostly;
 
 /* Whether the swap controller is active */
 #ifdef CONFIG_MEMCG_SWAP
@@ -4213,7 +4213,6 @@ mem_cgroup_css_alloc(struct cgroup_subsys_state 
*parent_css)
/* root ? */
if (parent_css == NULL) {
root_mem_cgroup = memcg;
-   mem_cgroup_root_css = >css;
page_counter_init(>memory, NULL);
memcg->high = PAGE_COUNTER_MAX;
memcg->soft_limit = PAGE_COUNTER_MAX;
-- 
2.6.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 8/8] mm: memcontrol: hook up vmpressure to socket pressure

2015-10-21 Thread Johannes Weiner

Let the networking stack know when a memcg is under reclaim pressure,
so it can shrink its transmit windows accordingly.

Whenever the reclaim efficiency of a memcg's LRU lists drops low
enough for a MEDIUM or HIGH vmpressure event to occur, assert a
pressure state in the socket and tcp memory code that tells it to
reduce memory usage in sockets associated with said memory cgroup.

vmpressure events are edge triggered, so for hysteresis assert socket
pressure for a second to allow for subsequent vmpressure events to
occur before letting the socket code return to normal.

Signed-off-by: Johannes Weiner 
---
 include/linux/memcontrol.h |  9 +
 include/net/sock.h |  4 
 include/net/tcp.h  |  4 
 mm/memcontrol.c|  1 +
 mm/vmpressure.c| 29 -
 5 files changed, 42 insertions(+), 5 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index d66ae18..b9990f7 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -246,6 +246,7 @@ struct mem_cgroup {
 
 #ifdef CONFIG_INET
struct work_struct socket_work;
+   unsigned long socket_pressure;
 #endif
 
/* List of events which userspace want to receive */
@@ -696,6 +697,10 @@ void sock_update_memcg(struct sock *sk);
 void sock_release_memcg(struct sock *sk);
 bool mem_cgroup_charge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages);
 void mem_cgroup_uncharge_skmem(struct mem_cgroup *memcg, unsigned int 
nr_pages);
+static inline bool mem_cgroup_socket_pressure(struct mem_cgroup *memcg)
+{
+   return time_before(jiffies, memcg->socket_pressure);
+}
 #else
 static inline bool mem_cgroup_do_sockets(void)
 {
@@ -716,6 +721,10 @@ static inline void mem_cgroup_uncharge_skmem(struct 
mem_cgroup *memcg,
 unsigned int nr_pages)
 {
 }
+static inline bool mem_cgroup_socket_pressure(struct mem_cgroup *memcg)
+{
+   return false;
+}
 #endif /* CONFIG_INET */
 
 #ifdef CONFIG_MEMCG_KMEM
diff --git a/include/net/sock.h b/include/net/sock.h
index 67795fc..22bfb9c 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1087,6 +1087,10 @@ static inline bool sk_has_memory_pressure(const struct 
sock *sk)
 
 static inline bool sk_under_memory_pressure(const struct sock *sk)
 {
+   if (mem_cgroup_do_sockets() && sk->sk_memcg &&
+   mem_cgroup_socket_pressure(sk->sk_memcg))
+   return true;
+
if (!sk->sk_prot->memory_pressure)
return false;
 
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 77b6c7e..c7d342c 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -291,6 +291,10 @@ extern int tcp_memory_pressure;
 /* optimized version of sk_under_memory_pressure() for TCP sockets */
 static inline bool tcp_under_memory_pressure(const struct sock *sk)
 {
+   if (mem_cgroup_do_sockets() && sk->sk_memcg &&
+   mem_cgroup_socket_pressure(sk->sk_memcg))
+   return true;
+
return tcp_memory_pressure;
 }
 /*
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index cb1d6aa..2e09def 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -4178,6 +4178,7 @@ mem_cgroup_css_alloc(struct cgroup_subsys_state 
*parent_css)
 #endif
 #ifdef CONFIG_INET
INIT_WORK(>socket_work, socket_work_func);
+   memcg->socket_pressure = jiffies;
 #endif
return >css;
 
diff --git a/mm/vmpressure.c b/mm/vmpressure.c
index 4c25e62..f64c0e1 100644
--- a/mm/vmpressure.c
+++ b/mm/vmpressure.c
@@ -137,14 +137,11 @@ struct vmpressure_event {
 };
 
 static bool vmpressure_event(struct vmpressure *vmpr,
-unsigned long scanned, unsigned long reclaimed)
+enum vmpressure_levels level)
 {
struct vmpressure_event *ev;
-   enum vmpressure_levels level;
bool signalled = false;
 
-   level = vmpressure_calc_level(scanned, reclaimed);
-
mutex_lock(>events_lock);
 
list_for_each_entry(ev, >events, node) {
@@ -162,6 +159,7 @@ static bool vmpressure_event(struct vmpressure *vmpr,
 static void vmpressure_work_fn(struct work_struct *work)
 {
struct vmpressure *vmpr = work_to_vmpressure(work);
+   enum vmpressure_levels level;
unsigned long scanned;
unsigned long reclaimed;
 
@@ -185,8 +183,29 @@ static void vmpressure_work_fn(struct work_struct *work)
vmpr->reclaimed = 0;
spin_unlock(>sr_lock);
 
+   level = vmpressure_calc_level(scanned, reclaimed);
+
+   if (level > VMPRESSURE_LOW) {
+   struct mem_cgroup *memcg;
+   /*
+* Let the socket buffer allocator know that we are
+* having trouble reclaiming LRU pages.
+*
+* For hysteresis, keep the pressure state asserted
+* for a second in which subsequent pressure events
+* can occur.
+*
+

[PATCH 6/8] mm: vmscan: simplify memcg vs. global shrinker invocation

2015-10-21 Thread Johannes Weiner

Letting shrink_slab() handle the root_mem_cgroup, and implicitely the
!CONFIG_MEMCG case, allows shrink_zone() to invoke the shrinkers
unconditionally from within the memcg iteration loop.

Signed-off-by: Johannes Weiner 
---
 include/linux/memcontrol.h |  2 ++
 mm/vmscan.c| 31 ---
 2 files changed, 18 insertions(+), 15 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 6f1e0f8..d66ae18 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -482,6 +482,8 @@ void mem_cgroup_split_huge_fixup(struct page *head);
 #else /* CONFIG_MEMCG */
 struct mem_cgroup;
 
+#define root_mem_cgroup NULL
+
 static inline void mem_cgroup_events(struct mem_cgroup *memcg,
 enum mem_cgroup_events_index idx,
 unsigned int nr)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 9b52ecf..ecc2125 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -411,6 +411,10 @@ static unsigned long shrink_slab(gfp_t gfp_mask, int nid,
struct shrinker *shrinker;
unsigned long freed = 0;
 
+   /* Global shrinker mode */
+   if (memcg == root_mem_cgroup)
+   memcg = NULL;
+
if (memcg && !memcg_kmem_is_active(memcg))
return 0;
 
@@ -2417,11 +2421,22 @@ static bool shrink_zone(struct zone *zone, struct 
scan_control *sc,
shrink_lruvec(lruvec, swappiness, sc, _pages);
zone_lru_pages += lru_pages;
 
-   if (memcg && is_classzone)
+   /*
+* Shrink the slab caches in the same proportion that
+* the eligible LRU pages were scanned.
+*/
+   if (is_classzone) {
shrink_slab(sc->gfp_mask, zone_to_nid(zone),
memcg, sc->nr_scanned - scanned,
lru_pages);
 
+   if (reclaim_state) {
+   sc->nr_reclaimed +=
+   reclaim_state->reclaimed_slab;
+   reclaim_state->reclaimed_slab = 0;
+   }
+   }
+
/*
 * Direct reclaim and kswapd have to scan all memory
 * cgroups to fulfill the overall scan target for the
@@ -2439,20 +2454,6 @@ static bool shrink_zone(struct zone *zone, struct 
scan_control *sc,
}
} while ((memcg = mem_cgroup_iter(root, memcg, )));
 
-   /*
-* Shrink the slab caches in the same proportion that
-* the eligible LRU pages were scanned.
-*/
-   if (global_reclaim(sc) && is_classzone)
-   shrink_slab(sc->gfp_mask, zone_to_nid(zone), NULL,
-   sc->nr_scanned - nr_scanned,
-   zone_lru_pages);
-
-   if (reclaim_state) {
-   sc->nr_reclaimed += reclaim_state->reclaimed_slab;
-   reclaim_state->reclaimed_slab = 0;
-   }
-
vmpressure(sc->gfp_mask, sc->target_mem_cgroup,
   sc->nr_scanned - nr_scanned,
   sc->nr_reclaimed - nr_reclaimed);
-- 
2.6.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 7/8] mm: vmscan: report vmpressure at the level of reclaim activity

2015-10-21 Thread Johannes Weiner

The vmpressure metric is based on reclaim efficiency, which in turn is
an attribute of the LRU. However, vmpressure events are currently
reported at the source of pressure rather than at the reclaim level.

Switch the reporting to the reclaim level to allow finer-grained
analysis of which memcg is having trouble reclaiming its pages.

As far as memory.pressure_level interface semantics go, events are
escalated up the hierarchy until a listener is found, so this won't
affect existing users that listen at higher levels.

This also prepares vmpressure for hooking it up to the networking
stack's memory pressure code.

Signed-off-by: Johannes Weiner 
---
 mm/vmscan.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index ecc2125..50630c8 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2404,6 +2404,7 @@ static bool shrink_zone(struct zone *zone, struct 
scan_control *sc,
memcg = mem_cgroup_iter(root, NULL, );
do {
unsigned long lru_pages;
+   unsigned long reclaimed;
unsigned long scanned;
struct lruvec *lruvec;
int swappiness;
@@ -2416,6 +2417,7 @@ static bool shrink_zone(struct zone *zone, struct 
scan_control *sc,
 
lruvec = mem_cgroup_zone_lruvec(zone, memcg);
swappiness = mem_cgroup_swappiness(memcg);
+   reclaimed = sc->nr_reclaimed;
scanned = sc->nr_scanned;
 
shrink_lruvec(lruvec, swappiness, sc, _pages);
@@ -2437,6 +2439,10 @@ static bool shrink_zone(struct zone *zone, struct 
scan_control *sc,
}
}
 
+   vmpressure(sc->gfp_mask, memcg,
+  sc->nr_scanned - scanned,
+  sc->nr_reclaimed - reclaimed);
+
/*
 * Direct reclaim and kswapd have to scan all memory
 * cgroups to fulfill the overall scan target for the
@@ -2454,10 +2460,6 @@ static bool shrink_zone(struct zone *zone, struct 
scan_control *sc,
}
} while ((memcg = mem_cgroup_iter(root, memcg, )));
 
-   vmpressure(sc->gfp_mask, sc->target_mem_cgroup,
-  sc->nr_scanned - nr_scanned,
-  sc->nr_reclaimed - nr_reclaimed);
-
if (sc->nr_reclaimed - nr_reclaimed)
reclaimable = true;
 
-- 
2.6.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 0/8] mm: memcontrol: account socket memory in unified hierarchy

2015-10-21 Thread Johannes Weiner

Hi,

this series adds socket buffer memory tracking and accounting to the
unified hierarchy memory cgroup controller.

[ Networking people, at this time please check the diffstat below to
  avoid going into convulsions. ]

Socket buffer memory can make up a significant share of a workload's
memory footprint, and so it needs to be accounted and tracked out of
the box, along with other types of memory that can be directly linked
to userspace activity, in order to provide useful resource isolation.

Historically, socket buffers were accounted in a separate counter,
without any pressure equalization between anonymous memory, page
cache, and the socket buffers. When the socket buffer pool was
exhausted, buffer allocations would fail hard and cause network
performance to tank, regardless of whether there was still memory
available to the group or not. Likewise, struggling anonymous or cache
workingsets could not dip into an idle socket memory pool. Because of
this, the feature was not usable for many real life applications.

To not repeat this mistake, the new memory controller will account all
types of memory pages it is tracking on behalf of a cgroup in a single
pool. And upon pressure, the VM reclaims and shrinks whatever memory
in that pool is within its reach.

These patches add accounting for memory consumed by sockets associated
with a cgroup to the existing pool of anonymous pages and page cache.

Patch #3 reworks the existing memcg socket infrastructure. It has many
provisions for future plans that won't materialize, and much of this
simply evaporates. The networking people should be happy about this.

Patch #5 adds accounting and tracking of socket memory to the unified
hierarchy memory controller, as described above. It uses the existing
per-cpu charge caches and triggers high limit reclaim asynchroneously.

Patch #8 uses the vmpressure extension to equalize pressure between
the pages tracked natively by the VM and socket buffer pages. As the
pool is shared, it makes sense that while natively tracked pages are
under duress the network transmit windows are also not increased.

As per above, this is an essential part of the new memory controller's
core functionality. With the unified hierarchy nearing release, please
consider this for 4.4.

 include/linux/memcontrol.h   |  90 +---
 include/linux/page_counter.h |   6 +-
 include/net/sock.h   | 139 ++--
 include/net/tcp.h|   5 +-
 include/net/tcp_memcontrol.h |   7 --
 mm/backing-dev.c |   2 +-
 mm/hugetlb_cgroup.c  |   3 +-
 mm/memcontrol.c  | 235 ++---
 mm/page_counter.c|  14 +--
 mm/vmpressure.c  |  29 -
 mm/vmscan.c  |  41 +++
 net/core/sock.c  |  78 --
 net/ipv4/sysctl_net_ipv4.c   |   1 -
 net/ipv4/tcp.c   |   3 +-
 net/ipv4/tcp_ipv4.c  |   9 +-
 net/ipv4/tcp_memcontrol.c| 147 --
 net/ipv4/tcp_output.c|   6 +-
 net/ipv6/tcp_ipv6.c  |   3 -
 18 files changed, 319 insertions(+), 499 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 4/8] mm: memcontrol: prepare for unified hierarchy socket accounting

2015-10-21 Thread Johannes Weiner

The unified hierarchy memory controller will account socket
memory. Move the infrastructure functions accordingly.

Signed-off-by: Johannes Weiner 
---
 mm/memcontrol.c | 136 
 1 file changed, 68 insertions(+), 68 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index c41e6d7..3789050 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -287,74 +287,6 @@ static inline struct mem_cgroup 
*mem_cgroup_from_id(unsigned short id)
return mem_cgroup_from_css(css);
 }
 
-/* Writing them here to avoid exposing memcg's inner layout */
-#if defined(CONFIG_INET) && defined(CONFIG_MEMCG_KMEM)
-
-DEFINE_STATIC_KEY_FALSE(mem_cgroup_sockets);
-
-void sock_update_memcg(struct sock *sk)
-{
-   struct mem_cgroup *memcg;
-   /*
-* Socket cloning can throw us here with sk_cgrp already
-* filled. It won't however, necessarily happen from
-* process context. So the test for root memcg given
-* the current task's memcg won't help us in this case.
-*
-* Respecting the original socket's memcg is a better
-* decision in this case.
-*/
-   if (sk->sk_memcg) {
-   BUG_ON(mem_cgroup_is_root(sk->sk_memcg));
-   css_get(>sk_memcg->css);
-   return;
-   }
-
-   rcu_read_lock();
-   memcg = mem_cgroup_from_task(current);
-   if (css_tryget_online(>css))
-   sk->sk_memcg = memcg;
-   rcu_read_unlock();
-}
-EXPORT_SYMBOL(sock_update_memcg);
-
-void sock_release_memcg(struct sock *sk)
-{
-   if (sk->sk_memcg)
-   css_put(>sk_memcg->css);
-}
-
-/**
- * mem_cgroup_charge_skmem - charge socket memory
- * @memcg: memcg to charge
- * @nr_pages: number of pages to charge
- *
- * Charges @nr_pages to @memcg. Returns %true if the charge fit within
- * the memcg's configured limit, %false if the charge had to be forced.
- */
-bool mem_cgroup_charge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages)
-{
-   struct page_counter *counter;
-
-   if (page_counter_try_charge(>skmem, nr_pages, ))
-   return true;
-
-   page_counter_charge(>skmem, nr_pages);
-   return false;
-}
-
-/**
- * mem_cgroup_uncharge_skmem - uncharge socket memory
- * @memcg: memcg to uncharge
- * @nr_pages: number of pages to uncharge
- */
-void mem_cgroup_uncharge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages)
-{
-   page_counter_uncharge(>skmem, nr_pages);
-}
-
-#endif
-
 #ifdef CONFIG_MEMCG_KMEM
 /*
  * This will be the memcg's index in each cache's ->memcg_params.memcg_caches.
@@ -5521,6 +5453,74 @@ void mem_cgroup_replace_page(struct page *oldpage, 
struct page *newpage)
commit_charge(newpage, memcg, true);
 }
 
+/* Writing them here to avoid exposing memcg's inner layout */
+#if defined(CONFIG_INET) && defined(CONFIG_MEMCG_KMEM)
+
+DEFINE_STATIC_KEY_FALSE(mem_cgroup_sockets);
+
+void sock_update_memcg(struct sock *sk)
+{
+   struct mem_cgroup *memcg;
+   /*
+* Socket cloning can throw us here with sk_cgrp already
+* filled. It won't however, necessarily happen from
+* process context. So the test for root memcg given
+* the current task's memcg won't help us in this case.
+*
+* Respecting the original socket's memcg is a better
+* decision in this case.
+*/
+   if (sk->sk_memcg) {
+   BUG_ON(mem_cgroup_is_root(sk->sk_memcg));
+   css_get(>sk_memcg->css);
+   return;
+   }
+
+   rcu_read_lock();
+   memcg = mem_cgroup_from_task(current);
+   if (css_tryget_online(>css))
+   sk->sk_memcg = memcg;
+   rcu_read_unlock();
+}
+EXPORT_SYMBOL(sock_update_memcg);
+
+void sock_release_memcg(struct sock *sk)
+{
+   if (sk->sk_memcg)
+   css_put(>sk_memcg->css);
+}
+
+/**
+ * mem_cgroup_charge_skmem - charge socket memory
+ * @memcg: memcg to charge
+ * @nr_pages: number of pages to charge
+ *
+ * Charges @nr_pages to @memcg. Returns %true if the charge fit within
+ * the memcg's configured limit, %false if the charge had to be forced.
+ */
+bool mem_cgroup_charge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages)
+{
+   struct page_counter *counter;
+
+   if (page_counter_try_charge(>skmem, nr_pages, ))
+   return true;
+
+   page_counter_charge(>skmem, nr_pages);
+   return false;
+}
+
+/**
+ * mem_cgroup_uncharge_skmem - uncharge socket memory
+ * @memcg: memcg to uncharge
+ * @nr_pages: number of pages to uncharge
+ */
+void mem_cgroup_uncharge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages)
+{
+   page_counter_uncharge(>skmem, nr_pages);
+}
+
+#endif
+
 /*
  * subsys_initcall() for memory controller.
  *
-- 
2.6.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at

[PATCH 1/2] can: xilinx: use readl/writel instead of ioread/iowrite

2015-10-21 Thread Kedareswara rao Appana

The driver only supports memory-mapped I/O [by ioremap()],
so readl/writel is actually the right thing to do, IMO.
During the validation of this driver or IP on ARM 64-bit processor
while sending lot of packets observed that the tx packet drop with iowrite
Putting the barriers for each tx fifo register write fixes this issue
Instead of barriers using writel also fixed this issue.

Signed-off-by: Kedareswara rao Appana 
---
 drivers/net/can/xilinx_can.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/can/xilinx_can.c b/drivers/net/can/xilinx_can.c
index 6114214..055d6f3 100644
--- a/drivers/net/can/xilinx_can.c
+++ b/drivers/net/can/xilinx_can.c
@@ -170,7 +170,7 @@ static const struct can_bittiming_const 
xcan_bittiming_const = {
 static void xcan_write_reg_le(const struct xcan_priv *priv, enum xcan_reg reg,
u32 val)
 {
-   iowrite32(val, priv->reg_base + reg);
+   writel(val, priv->reg_base + reg);
 }
 
 /**
@@ -183,7 +183,7 @@ static void xcan_write_reg_le(const struct xcan_priv *priv, 
enum xcan_reg reg,
  */
 static u32 xcan_read_reg_le(const struct xcan_priv *priv, enum xcan_reg reg)
 {
-   return ioread32(priv->reg_base + reg);
+   return readl(priv->reg_base + reg);
 }
 
 /**
-- 
2.1.2

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v6] can: xilinx: Convert to runtime_pm

2015-10-21 Thread Kedareswara rao Appana

Instead of enabling/disabling clocks at several locations in the driver,
Use the runtime_pm framework. This consolidates the actions for runtime PM
In the appropriate callbacks and makes the driver more readable and mantainable.

Signed-off-by: Kedareswara rao Appana 
---
Sorry for the long delay for sending this next version of the patch.
somehow couldn't manage to send this next version of the patch.
 
Changes for v6:
 - Updated the driver with review comments as suggested by Marc.
Changes for v5:
 - Updated with the review comments.
   Updated the remove fuction to use runtime_pm.
Chnages for v4:
 - Updated with the review comments.
Changes for v3:
  - Converted the driver to use runtime_pm.
Changes for v2:
  - Removed the struct platform_device* from suspend/resume
as suggest by Lothar.

 drivers/net/can/xilinx_can.c | 176 +++
 1 file changed, 110 insertions(+), 66 deletions(-)

diff --git a/drivers/net/can/xilinx_can.c b/drivers/net/can/xilinx_can.c
index fc55e8e..6114214 100644
--- a/drivers/net/can/xilinx_can.c
+++ b/drivers/net/can/xilinx_can.c
@@ -32,6 +32,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define DRIVER_NAME"xilinx_can"
 
@@ -138,7 +139,7 @@ struct xcan_priv {
u32 (*read_reg)(const struct xcan_priv *priv, enum xcan_reg reg);
void (*write_reg)(const struct xcan_priv *priv, enum xcan_reg reg,
u32 val);
-   struct net_device *dev;
+   struct device *dev;
void __iomem *reg_base;
unsigned long irq_flags;
struct clk *bus_clk;
@@ -843,6 +844,13 @@ static int xcan_open(struct net_device *ndev)
struct xcan_priv *priv = netdev_priv(ndev);
int ret;
 
+   ret = pm_runtime_get_sync(priv->dev);
+   if (ret < 0) {
+   netdev_err(ndev, "%s: pm_runtime_get failed(%d)\n",
+   __func__, ret);
+   return ret;
+   }
+
ret = request_irq(ndev->irq, xcan_interrupt, priv->irq_flags,
ndev->name, ndev);
if (ret < 0) {
@@ -850,29 +858,17 @@ static int xcan_open(struct net_device *ndev)
goto err;
}
 
-   ret = clk_prepare_enable(priv->can_clk);
-   if (ret) {
-   netdev_err(ndev, "unable to enable device clock\n");
-   goto err_irq;
-   }
-
-   ret = clk_prepare_enable(priv->bus_clk);
-   if (ret) {
-   netdev_err(ndev, "unable to enable bus clock\n");
-   goto err_can_clk;
-   }
-
/* Set chip into reset mode */
ret = set_reset_mode(ndev);
if (ret < 0) {
netdev_err(ndev, "mode resetting failed!\n");
-   goto err_bus_clk;
+   goto err_irq;
}
 
/* Common open */
ret = open_candev(ndev);
if (ret)
-   goto err_bus_clk;
+   goto err_irq;
 
ret = xcan_chip_start(ndev);
if (ret < 0) {
@@ -888,13 +884,11 @@ static int xcan_open(struct net_device *ndev)
 
 err_candev:
close_candev(ndev);
-err_bus_clk:
-   clk_disable_unprepare(priv->bus_clk);
-err_can_clk:
-   clk_disable_unprepare(priv->can_clk);
 err_irq:
free_irq(ndev->irq, ndev);
 err:
+   pm_runtime_put(priv->dev);
+
return ret;
 }
 
@@ -911,12 +905,11 @@ static int xcan_close(struct net_device *ndev)
netif_stop_queue(ndev);
napi_disable(>napi);
xcan_chip_stop(ndev);
-   clk_disable_unprepare(priv->bus_clk);
-   clk_disable_unprepare(priv->can_clk);
free_irq(ndev->irq, ndev);
close_candev(ndev);
 
can_led_event(ndev, CAN_LED_EVENT_STOP);
+   pm_runtime_put(priv->dev);
 
return 0;
 }
@@ -935,27 +928,20 @@ static int xcan_get_berr_counter(const struct net_device 
*ndev,
struct xcan_priv *priv = netdev_priv(ndev);
int ret;
 
-   ret = clk_prepare_enable(priv->can_clk);
-   if (ret)
-   goto err;
-
-   ret = clk_prepare_enable(priv->bus_clk);
-   if (ret)
-   goto err_clk;
+   ret = pm_runtime_get_sync(priv->dev);
+   if (ret < 0) {
+   netdev_err(ndev, "%s: pm_runtime_get failed(%d)\n",
+   __func__, ret);
+   return ret;
+   }
 
bec->txerr = priv->read_reg(priv, XCAN_ECR_OFFSET) & XCAN_ECR_TEC_MASK;
bec->rxerr = ((priv->read_reg(priv, XCAN_ECR_OFFSET) &
XCAN_ECR_REC_MASK) >> XCAN_ESR_REC_SHIFT);
 
-   clk_disable_unprepare(priv->bus_clk);
-   clk_disable_unprepare(priv->can_clk);
+   pm_runtime_put(priv->dev);
 
return 0;
-
-err_clk:
-   clk_disable_unprepare(priv->can_clk);
-err:
-   return ret;
 }
 
 
@@ -968,15 +954,45 @@ static const struct net_device_ops xcan_netdev_ops = {
 
 /**
  * xcan_suspend - Suspend method for the driver
- * @dev:   Address of the platform_device structure
+ *

Re: [Bug 106241] New: shutdown(3)/close(3) behaviour is incorrect for sockets in accept(3)

2015-10-21 Thread Alan Burlison

On 22/10/2015 02:29, David Miller wrote:

From: Al Viro 
Date: Wed, 21 Oct 2015 19:51:04 +0100

Sure, but the upkeep of data structures it would need is there
whether you actually end up triggering it or not.  Both in
memory footprint and in cacheline pingpong...

+1

It's been said that the current mechanisms in Linux & some BSD variants 
can be subject to races, and the behaviour exhibited doesn't conform to 
POSIX, for example requiring the use of shutdown() on unconnected 
sockets because close() doesn't kick off other threads accept()ing on 
the same fd. I'd be interested to hear if there's a better and more 
performant way of handling the situation that doesn't involve doing the 
sort of bookkeeping Casper described,.

To quote one of my colleague's favourite sayings: Performance is a goal, 
correctness is a constraint.

--
Alan Burlison
--
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 net-next] bpf: fix bpf_perf_event_read() helper

2015-10-21 Thread Wangnan (F)

After applying this patch I'm unable to use perf passing perf_event 
again like this:


 # perf record -a -e evt=cycles -e 
./test_config_map.c/maps.pmu_map.event=evt/ --exclude-perf ls


With -v it output:

...
adding perf_bpf_probe:func_write
adding perf_bpf_probe:func_write to 0x367d6a0
add bpf event perf_bpf_probe:func_write_return and attach bpf program 6
adding perf_bpf_probe:func_write_return
adding perf_bpf_probe:func_write_return to 0x3a7fc40
mmap size 528384B
ERROR: failed to insert value to pmu_map[0]
ERROR: Apply config to BPF failed: Invalid option for map, add -v to see 
detail

Opening /sys/kernel/debug/tracing//kprobe_events write=
...

Looks like perf sets attr.inherit for cycles? I'll look into this problem.

Thank you.

On 2015/10/22 6:58, Alexei Starovoitov wrote:

Fix safety checks for bpf_perf_event_read():
- only non-inherited events can be added to perf_event_array map
   (do this check statically at map insertion time)
- dynamically check that event is local and !pmu->count
Otherwise buggy bpf program can cause kernel splat.

Fixes: 35578d798400 ("bpf: Implement function bpf_perf_event_read() that get the 
selected hardware PMU conuter")
Signed-off-by: Alexei Starovoitov 
---
v1->v2: fix compile in case of !CONFIG_PERF_EVENTS

This patch is on top of
http://patchwork.ozlabs.org/patch/533585/
to avoid conflicts.
Even in the worst case the crash is not possible.
Only warn_on_once, so imo net-next is ok.

  kernel/bpf/arraymap.c |9 +
  kernel/events/core.c  |   16 ++--
  2 files changed, 15 insertions(+), 10 deletions(-)

diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index e3cfe46b074f..75529cc94304 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -294,10 +294,11 @@ static void *perf_event_fd_array_get_ptr(struct bpf_map 
*map, int fd)
if (IS_ERR(attr))
return (void *)attr;
  
-	if (attr->type != PERF_TYPE_RAW &&

-   !(attr->type == PERF_TYPE_SOFTWARE &&
- attr->config == PERF_COUNT_SW_BPF_OUTPUT) &&
-   attr->type != PERF_TYPE_HARDWARE) {
+   if ((attr->type != PERF_TYPE_RAW &&
+!(attr->type == PERF_TYPE_SOFTWARE &&
+  attr->config == PERF_COUNT_SW_BPF_OUTPUT) &&
+attr->type != PERF_TYPE_HARDWARE) ||
+   attr->inherit) {
perf_event_release_kernel(event);
return ERR_PTR(-EINVAL);
}
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 64754bfecd70..0b6333265872 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -3258,7 +3258,7 @@ static inline u64 perf_event_count(struct perf_event 
*event)
  u64 perf_event_read_local(struct perf_event *event)
  {
unsigned long flags;
-   u64 val;
+   u64 val = -EINVAL;
  
  	/*

 * Disabling interrupts avoids all counter scheduling (context
@@ -3267,12 +3267,14 @@ u64 perf_event_read_local(struct perf_event *event)
local_irq_save(flags);
  
  	/* If this is a per-task event, it must be for current */

-   WARN_ON_ONCE((event->attach_state & PERF_ATTACH_TASK) &&
-event->hw.target != current);
+   if ((event->attach_state & PERF_ATTACH_TASK) &&
+   event->hw.target != current)
+   goto out;
  
  	/* If this is a per-CPU event, it must be for this CPU */

-   WARN_ON_ONCE(!(event->attach_state & PERF_ATTACH_TASK) &&
-event->cpu != smp_processor_id());
+   if (!(event->attach_state & PERF_ATTACH_TASK) &&
+   event->cpu != smp_processor_id())
+   goto out;
  
  	/*

 * It must not be an event with inherit set, we cannot read
@@ -3284,7 +3286,8 @@ u64 perf_event_read_local(struct perf_event *event)
 * It must not have a pmu::count method, those are not
 * NMI safe.
 */
-   WARN_ON_ONCE(event->pmu->count);
+   if (event->pmu->count)
+   goto out;
  
  	/*

 * If the event is currently on this CPU, its either a per-task event,
@@ -3295,6 +3298,7 @@ u64 perf_event_read_local(struct perf_event *event)
event->pmu->read(event);
  
  	val = local64_read(>count);

+out:
local_irq_restore(flags);
  
  	return val;



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 net-next] bpf: fix bpf_perf_event_read() helper

2015-10-21 Thread Wangnan (F)




On 2015/10/22 6:58, Alexei Starovoitov wrote:

Fix safety checks for bpf_perf_event_read():
- only non-inherited events can be added to perf_event_array map
   (do this check statically at map insertion time)
- dynamically check that event is local and !pmu->count
Otherwise buggy bpf program can cause kernel splat.

Fixes: 35578d798400 ("bpf: Implement function bpf_perf_event_read() that get the 
selected hardware PMU conuter")
Signed-off-by: Alexei Starovoitov 
---
v1->v2: fix compile in case of !CONFIG_PERF_EVENTS

This patch is on top of
http://patchwork.ozlabs.org/patch/533585/
to avoid conflicts.
Even in the worst case the crash is not possible.
Only warn_on_once, so imo net-next is ok.

  kernel/bpf/arraymap.c |9 +
  kernel/events/core.c  |   16 ++--
  2 files changed, 15 insertions(+), 10 deletions(-)

diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index e3cfe46b074f..75529cc94304 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -294,10 +294,11 @@ static void *perf_event_fd_array_get_ptr(struct bpf_map 
*map, int fd)
if (IS_ERR(attr))
return (void *)attr;
  
-	if (attr->type != PERF_TYPE_RAW &&

-   !(attr->type == PERF_TYPE_SOFTWARE &&
- attr->config == PERF_COUNT_SW_BPF_OUTPUT) &&
-   attr->type != PERF_TYPE_HARDWARE) {
+   if ((attr->type != PERF_TYPE_RAW &&
+!(attr->type == PERF_TYPE_SOFTWARE &&
+  attr->config == PERF_COUNT_SW_BPF_OUTPUT) &&
+attr->type != PERF_TYPE_HARDWARE) ||
+   attr->inherit) {


This 'if' statement is so complex. What about using a inline function 
instead?


Thank you.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next RFC 2/2] vhost_net: basic polling support

2015-10-21 Thread Jason Wang

This patch tries to poll for new added tx buffer for a while at the
end of tx processing. The maximum time spent on polling were limited
through a module parameter. To avoid block rx, the loop will end it
there's new other works queued on vhost so in fact socket receive
queue is also be polled.

busyloop_timeout = 50 gives us following improvement on TCP_RR test:

size/session/+thu%/+normalize%
1/ 1/   +5%/  -20%
1/50/  +17%/   +3%

Signed-off-by: Jason Wang 
---
 drivers/vhost/net.c | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 9eda69e..bbb522a 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -31,7 +31,9 @@
 #include "vhost.h"
 
 static int experimental_zcopytx = 1;
+static int busyloop_timeout = 50;
 module_param(experimental_zcopytx, int, 0444);
+module_param(busyloop_timeout, int, 0444);
 MODULE_PARM_DESC(experimental_zcopytx, "Enable Zero Copy TX;"
   " 1 -Enable; 0 - Disable");
 
@@ -287,12 +289,23 @@ static void vhost_zerocopy_callback(struct ubuf_info 
*ubuf, bool success)
rcu_read_unlock_bh();
 }
 
+static bool tx_can_busy_poll(struct vhost_dev *dev,
+unsigned long endtime)
+{
+   unsigned long now = local_clock() >> 10;
+
+   return busyloop_timeout && !need_resched() &&
+  !time_after(now, endtime) && !vhost_has_work(dev) &&
+  single_task_running();
+}
+
 /* Expects to be always run from workqueue - which acts as
  * read-size critical section for our kind of RCU. */
 static void handle_tx(struct vhost_net *net)
 {
struct vhost_net_virtqueue *nvq = >vqs[VHOST_NET_VQ_TX];
struct vhost_virtqueue *vq = >vq;
+   unsigned long endtime;
unsigned out, in;
int head;
struct msghdr msg = {
@@ -331,6 +344,8 @@ static void handle_tx(struct vhost_net *net)
  % UIO_MAXIOV == nvq->done_idx))
break;
 
+   endtime  = (local_clock() >> 10) + busyloop_timeout;
+again:
head = vhost_get_vq_desc(vq, vq->iov,
 ARRAY_SIZE(vq->iov),
 , ,
@@ -340,6 +355,10 @@ static void handle_tx(struct vhost_net *net)
break;
/* Nothing new?  Wait for eventfd to tell us they refilled. */
if (head == vq->num) {
+   if (tx_can_busy_poll(vq->dev, endtime)) {
+   cpu_relax();
+   goto again;
+   }
if (unlikely(vhost_enable_notify(>dev, vq))) {
vhost_disable_notify(>dev, vq);
continue;
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next RFC 1/2] vhost: introduce vhost_has_work()

2015-10-21 Thread Jason Wang

This path introduces a helper which can give a hint for whether or not
there's a work queued in the work list.

Signed-off-by: Jason Wang 
---
 drivers/vhost/vhost.c | 6 ++
 drivers/vhost/vhost.h | 1 +
 2 files changed, 7 insertions(+)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index eec2f11..d42d11e 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -245,6 +245,12 @@ void vhost_work_queue(struct vhost_dev *dev, struct 
vhost_work *work)
 }
 EXPORT_SYMBOL_GPL(vhost_work_queue);
 
+bool vhost_has_work(struct vhost_dev *dev)
+{
+   return !list_empty(>work_list);
+}
+EXPORT_SYMBOL_GPL(vhost_has_work);
+
 void vhost_poll_queue(struct vhost_poll *poll)
 {
vhost_work_queue(poll->dev, >work);
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index 4772862..ea0327d 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -37,6 +37,7 @@ struct vhost_poll {
 
 void vhost_work_init(struct vhost_work *work, vhost_work_fn_t fn);
 void vhost_work_queue(struct vhost_dev *dev, struct vhost_work *work);
+bool vhost_has_work(struct vhost_dev *dev);
 
 void vhost_poll_init(struct vhost_poll *poll, vhost_work_fn_t fn,
 unsigned long mask, struct vhost_dev *dev);
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] pcnet32: fix a logic error with pci_set_dma_mask

2015-10-21 Thread Don Fry

On Mon, 2015-10-12 at 05:38 -0700, David Miller wrote:
> From: Geliang Tang 
> Date: Fri,  9 Oct 2015 03:45:39 -0700
> 
> > pcnet32 can't work on my machine recently. It says "architecture
> > does not support 32bit PCI busmaster DMA". There is a logic error
> > in it: pci_set_dma_mask() return 0 means return successfully.
> > 
> > Signed-off-by: Geliang Tang 
> 
> This driver doesn't call pci_set_dma_mask() in any of my tree(s).
I failed.  My system with pcnet32 boards was down with a dead power
supply and a visual review was not good enough.  I missed that
pci_dma_supported returns 1 on success and pci_set_dma_mask returns 0 on
success.  The original patch needs to have the ! removed as Geliang Tang
points out.

Acked-by:  Don Fry 

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 net-next] bpf: fix bpf_perf_event_read() helper

2015-10-21 Thread Wangnan (F)




On 2015/10/22 13:00, Alexei Starovoitov wrote:

On 10/21/15 9:49 PM, Wangnan (F) wrote:

After applying this patch I'm unable to use perf passing perf_event
again like this:


please do not top post and trim your replies.


  # perf record -a -e evt=cycles -e
./test_config_map.c/maps.pmu_map.event=evt/ --exclude-perf ls

With -v it output:

...
adding perf_bpf_probe:func_write
adding perf_bpf_probe:func_write to 0x367d6a0
add bpf event perf_bpf_probe:func_write_return and attach bpf program 6
adding perf_bpf_probe:func_write_return
adding perf_bpf_probe:func_write_return to 0x3a7fc40
mmap size 528384B
ERROR: failed to insert value to pmu_map[0]
ERROR: Apply config to BPF failed: Invalid option for map, add -v to see
detail
Opening /sys/kernel/debug/tracing//kprobe_events write=
...

Looks like perf sets attr.inherit for cycles? I'll look into this 
problem.


yes. that's perf default.
How did it even work before?!
I was testing with your samples/bpf/tracex6 that sets inherit to zero.



Tested perf record -i option and it works for me:

# echo "" > /sys/kernel/debug/tracing/trace
# perf record -i -a -e evt=cycles -e 
./test_config_map.c/maps.pmu_map.event=evt/ --exclude-perf ls

# cat /sys/kernel/debug/tracing/trace  | grep ls
  ls-8227  [001] dN..  2526.184611: : pmu inc: 82270
  ls-8227  [001] dN..  2526.184626: : pmu inc: 40951
  ls-8227  [001] dN..  2526.184642: : pmu inc: 50659
  ls-8227  [001] dN..  2526.184657: : pmu inc: 43511
  ls-8227  [001] dN..  2526.184675: : pmu inc: 56921
...
And no warning message found in dmesg.

So I think your fix is good, we should improve perf.

Thank you.




--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 5/8] mm: memcontrol: account socket memory on unified hierarchy

2015-10-21 Thread Johannes Weiner

Socket memory can be a significant share of overall memory consumed by
common workloads. In order to provide reasonable resource isolation
out-of-the-box in the unified hierarchy, this type of memory needs to
be accounted and tracked per default in the memory controller.

Signed-off-by: Johannes Weiner 
---
 include/linux/memcontrol.h | 16 ++--
 mm/memcontrol.c| 95 --
 2 files changed, 87 insertions(+), 24 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 5b72f83..6f1e0f8 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -244,6 +244,10 @@ struct mem_cgroup {
struct wb_domain cgwb_domain;
 #endif
 
+#ifdef CONFIG_INET
+   struct work_struct socket_work;
+#endif
+
/* List of events which userspace want to receive */
struct list_head event_list;
spinlock_t event_list_lock;
@@ -676,11 +680,15 @@ static inline void mem_cgroup_wb_stats(struct 
bdi_writeback *wb,
 #endif /* CONFIG_CGROUP_WRITEBACK */
 
 struct sock;
-#if defined(CONFIG_INET) && defined(CONFIG_MEMCG_KMEM)
-extern struct static_key_false mem_cgroup_sockets;
+#ifdef CONFIG_INET
+extern struct static_key_true mem_cgroup_sockets;
 static inline bool mem_cgroup_do_sockets(void)
 {
-   return static_branch_unlikely(_cgroup_sockets);
+   if (mem_cgroup_disabled())
+   return false;
+   if (!static_branch_likely(_cgroup_sockets))
+   return false;
+   return true;
 }
 void sock_update_memcg(struct sock *sk);
 void sock_release_memcg(struct sock *sk);
@@ -706,7 +714,7 @@ static inline void mem_cgroup_uncharge_skmem(struct 
mem_cgroup *memcg,
 unsigned int nr_pages)
 {
 }
-#endif /* CONFIG_INET && CONFIG_MEMCG_KMEM */
+#endif /* CONFIG_INET */
 
 #ifdef CONFIG_MEMCG_KMEM
 extern struct static_key memcg_kmem_enabled_key;
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 3789050..cb1d6aa 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1916,6 +1916,18 @@ static int memcg_cpu_hotplug_callback(struct 
notifier_block *nb,
return NOTIFY_OK;
 }
 
+static void reclaim_high(struct mem_cgroup *memcg,
+unsigned int nr_pages,
+gfp_t gfp_mask)
+{
+   do {
+   if (page_counter_read(>memory) <= memcg->high)
+   continue;
+   mem_cgroup_events(memcg, MEMCG_HIGH, 1);
+   try_to_free_mem_cgroup_pages(memcg, nr_pages, gfp_mask, true);
+   } while ((memcg = parent_mem_cgroup(memcg)));
+}
+
 /*
  * Scheduled by try_charge() to be executed from the userland return path
  * and reclaims memory over the high limit.
@@ -1923,20 +1935,13 @@ static int memcg_cpu_hotplug_callback(struct 
notifier_block *nb,
 void mem_cgroup_handle_over_high(void)
 {
unsigned int nr_pages = current->memcg_nr_pages_over_high;
-   struct mem_cgroup *memcg, *pos;
+   struct mem_cgroup *memcg;
 
if (likely(!nr_pages))
return;
 
-   pos = memcg = get_mem_cgroup_from_mm(current->mm);
-
-   do {
-   if (page_counter_read(>memory) <= pos->high)
-   continue;
-   mem_cgroup_events(pos, MEMCG_HIGH, 1);
-   try_to_free_mem_cgroup_pages(pos, nr_pages, GFP_KERNEL, true);
-   } while ((pos = parent_mem_cgroup(pos)));
-
+   memcg = get_mem_cgroup_from_mm(current->mm);
+   reclaim_high(memcg, nr_pages, GFP_KERNEL);
css_put(>css);
current->memcg_nr_pages_over_high = 0;
 }
@@ -4129,6 +4134,8 @@ struct mem_cgroup *parent_mem_cgroup(struct mem_cgroup 
*memcg)
 }
 EXPORT_SYMBOL(parent_mem_cgroup);
 
+static void socket_work_func(struct work_struct *work);
+
 static struct cgroup_subsys_state * __ref
 mem_cgroup_css_alloc(struct cgroup_subsys_state *parent_css)
 {
@@ -4169,6 +4176,9 @@ mem_cgroup_css_alloc(struct cgroup_subsys_state 
*parent_css)
 #ifdef CONFIG_CGROUP_WRITEBACK
INIT_LIST_HEAD(>cgwb_list);
 #endif
+#ifdef CONFIG_INET
+   INIT_WORK(>socket_work, socket_work_func);
+#endif
return >css;
 
 free_out:
@@ -4266,6 +4276,8 @@ static void mem_cgroup_css_free(struct 
cgroup_subsys_state *css)
 {
struct mem_cgroup *memcg = mem_cgroup_from_css(css);
 
+   cancel_work_sync(>socket_work);
+
memcg_destroy_kmem(memcg);
__mem_cgroup_free(memcg);
 }
@@ -4948,10 +4960,15 @@ static void mem_cgroup_bind(struct cgroup_subsys_state 
*root_css)
 * guarantees that @root doesn't have any children, so turning it
 * on for the root memcg is enough.
 */
-   if (cgroup_subsys_on_dfl(memory_cgrp_subsys))
+   if (cgroup_subsys_on_dfl(memory_cgrp_subsys)) {
root_mem_cgroup->use_hierarchy = true;
-   else
+#ifdef CONFIG_INET
+   /* unified hierarchy always counts skmem */
+

Re: Fw: [Bug 106241] New: shutdown(3)/close(3) behaviour is incorrect for sockets in accept(3)

2015-10-21 Thread Al Viro

On Wed, Oct 21, 2015 at 10:33:04PM +0200, casper@oracle.com wrote:
> 
> >On Wed, Oct 21, 2015 at 03:38:51PM +0100, Alan Burlison wrote:
> >
> >> >There's going to be a notion of "last close"; that's what this refcount is
> >> >about and _that_ is more than implementation detail.
> >> 
> >> Yes, POSIX distinguishes between "file descriptor" and "file
> >> description" (ugh!) and the close() page says:
> >
> >Would've been better if they went for something like "IO channel" for
> >the latter ;-/
> 
> Or at least some other word.  A file descriptor is just an index to
> a list of file pointers (and wasn't named so?)

*nod*

There's no less than 3 distinct notions associated with the word "file" -
"file as collection of bytes on filesystem", "opened file as IO channel" and
"file descriptor", all related ;-/  "File description" vs. "file descriptor"
is atrociously bad terminology.

> >Unless they want to consider in-flight descriptor-passing datagrams as
> >collections of file descriptors, the quoted sentence is seriously misleading.
> >And then there's mmap(), which they do kinda-sorta mention...
> 
> Well, a file descriptor really only exists in the context of a process; 
> in-flight it is no longer a file descriptor as there process context with 
> a file descriptor table; so pointers to file descriptions are passed 
> around.

Yes.  Note, BTW, that descriptor contains a bit more than a pointer - there
are properties associated with it (close-on-exec and is-it-already-claimed),
which makes abusing it for describing SCM_RIGHTS payloads even more of a
stretch.  IOW, description of semantics for close() and friends needs fixing -
it simply does not match the situation on anything that would be anywhere
near POSIX compliance in other areas.

> >Sure, but the upkeep of data structures it would need is there
> >whether you actually end up triggering it or not.  Both in
> >memory footprint and in cacheline pingpong...
> 
> Most of the work on using a file descriptor is local to the thread.

Using - sure, but what of cacheline dirtied every time you resolve a
descriptor to file reference?  How much does it cover and to what
degree is that local to thread?  When it's a refcount inside struct file -
no big deal, we'll be reading the same cacheline anyway and unless several
threads are pounding on the same file with syscalls at the same time,
that won't be a big deal.  But when refcounts are associated with
descriptors...

In case of Linux we have two bitmaps and an array of pointers associated
with descriptor table.  They grow on demand (in parallel)
* reserving a descriptor is done under ->file_lock (dropped/regained
around memory allocation if we end up expanding the sucker, actual reassignment
of pointers to array/bitmaps is under that spinlock)
* installing a pointer is lockless (we wait for ongoing resize to
settle, RCU takes care of the rest)
* grabbing a file by index is lockless as well
* removing a pointer is under ->file_lock, so's replacing it by dup2().

Grabbing a file by descriptor follows pointer from task_struct to descriptor
table, from descriptor table to element of array of pointers (embedded when
we have few descriptors, but becomes separately allocated when more is
needed), and from array element to struct file.  In struct file we fetch
->f_mode and (if descriptor table is shared) atomically increment ->f_count.

For comparison, NetBSD has an extra level of indirection (with similar
tricks for embedding them while there are few descriptors), with a lot
fatter structure around the pointer to file - they keep close-on-exec and
in-use in there, along with refcount and their equivalent of waitqueue.
These structures, once they grow past the embedded set, are allocated
one-by-one, so copying the table on fork() costs a _lot_ more.  Rather
than an array of pointers to files they have an array of pointers to
those guys.  Reserving a descriptor triggers allocation of new struct
fdfile and installing a pointer to it into the array.  Allows for slightly
simpler installing of pointer to file afterwards - unlike us, they don't
have to be careful about array resize happening in parallel.

Grabbing a file by index is lockless, so's installing a pointer to file.
Reserving a descriptor is under ->fd_lock (mutex rather than a spinlock).
Removing a pointer is under ->fd_lock, so's replacing it by
dup2(), but dup2() has an unpleasant race (see upthread).

They do the same amount of pointer-chasing on lookup proper, but only because
they do not look into struct file itself there.  Which happens immediately
afterwards, since callers *will* look into what they've got.  I didn't look
into the details of barrier use in there, but it looks generally sane.

Cacheline pingpong is probably not a big deal there, but only because these
structures are fat and scattered.  Another fun issue is that they have
in-use bits buried deep, which means that they need to mirror them in
a separate

Re: [Bug 106241] New: shutdown(3)/close(3) behaviour is incorrect for sockets in accept(3)

2015-10-21 Thread Al Viro

On Thu, Oct 22, 2015 at 05:17:50AM +0100, Alan Burlison wrote:

> It's been said that the current mechanisms in Linux & some BSD
> variants can be subject to races

You do realize that it goes for the entire area?  And the races found
in this thread are in the BSD variant that tries to do something similar
to what you guys say Solaris is doing, so I'm not sure which way does
that argument go.  A high-level description with the same level of details
will be race-free in _all_ of them.  The devil is in the details, of course,
and historically they had been very easy to get wrong.  And extra complexity
in that area doesn't make things better.

>, and the behaviour exhibited
> doesn't conform to POSIX, for example requiring the use of
> shutdown() on unconnected sockets because close() doesn't kick off
> other threads accept()ing on the same fd.

Umm...  The old kernels (and even more - old userland) are not going to
disappear, so we are stuck with such uses of shutdown(2) anyway, POSIX or
no POSIX.

> I'd be interested to hear
> if there's a better and more performant way of handling the
> situation that doesn't involve doing the sort of bookkeeping Casper
> described,.

So would a lot of other people.

> To quote one of my colleague's favourite sayings: Performance is a
> goal, correctness is a constraint.

Except that in this case "correctness" is the matter of rather obscure and
ill-documented areas in POSIX.  Don't get me wrong - this semantics isn't
inherently bad, but it's nowhere near being an absolute requirement.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next v7 00/10] Add new drivers: qed & qede

2015-10-21 Thread Yuval Mintz

From: Ariel Elior 

This series implements the driver set for Qlogic's new QL4xxx series.
These are 10/20/25/40/50/100 Gig capable converged nics, supporting
ethernet (obviously), iscsi, fcoe, roce and iwarp protocols.

The overall driver design includes a common module ('qed') and protocol
specific dependent modules for ethernet ('qede'), fcoe ('qedf'),
iscsi ('qedi') and roce ('qedr').
The common module contains all of the common logic, e.g. initialization,
cleanup, infrastructure for interrupt handling, link management, slowpath
etc. as well as protocol agnostic features, and supplying an abstraction
layer for other modules.
The protocol specific modules can be compiled and operated independently
of each other, with the exception of the rdma modules which are dependent
on the ethernet module, in accordance with the kernel rdma stack design.

This series only adds the core and ethernet modules, with basic L2
capabilities. Future series will add the rest of the modules and enhance
the L2 functionality.

Ths patch series is constructed of the following patches:
qed:  Add module with basic common support
qed:  Add basic L2 interface
qede: Add basic Network driver
qed:  Add slowpath L2 support
qede: Add basic network device support
qede: Add classification configuration
qed:  Add link support
qede: Add support for link
qed:  Add statistics support
qede: Add basic ethtool support

This project is a team effort, thanks go to Yuval Mintz, Dmitry Kravkov,
Michal Kalderon, Tomer Tayar, Manish Chopra, Sudarsana Kalluru,
Rajesh Borundia, Sony Chacko, Artum Zolotushko, Harish Patil, Rasesh Mody,
Sergey Ukhterov and Elad Manela, as well as former team members,
Eilon Greenstein and Shmulik Ravid.

Changes from previos version:
-

>From Version 6:
  - Reduced the number of arguments for functions with exceptionally
high number of parameters.

>From Version 5:
  - Style change and fixes [mostly in patches 1, 4 and 7].
Thanks go to Francois Romieu, a mere mortal. ;-)

>From Version 4:
  - Drop dependency for x86_64.

>From Version 3:
  - Limit support of initial submission to x86_64.
  - Fix endian problems appearing via sparse [although no BE support yet].
  - Fix small issues suggested by the kbuild test robot.

>From Version 2:
  - Removed U64_{HI,LO}; Using {upper,lower}_32_bits instead.
  - Use regular napi weight definition.
  - [We still use the __le variants for variables, since we didn't get
 a reply regarding the change into non-user API types].

>From Version 1:
  - Removed private license file; Instead revised comments at source headers.

Thanks,
Ariel Elior
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next v7 08/10] qede: Add support for link

2015-10-21 Thread Yuval Mintz

From: Sudarsana Kalluru 

This adds basic link functionality to qede - driver still doesn't provide
users with an API to change any link property, but it does request qed to
initialize the link using default configuration, and registers a callback
that allows it to get link notifications.

This patch adds the ability of the driver to set the carrier as active and
to enable traffic as a result of async. link notifications.
Following this patch, driver should be capable of running traffic.

Signed-off-by: Sudarsana Kalluru 
Signed-off-by: Yuval Mintz 
Signed-off-by: Ariel Elior 
---
 drivers/net/ethernet/qlogic/qede/qede_main.c | 47 
 1 file changed, 47 insertions(+)

diff --git a/drivers/net/ethernet/qlogic/qede/qede_main.c 
b/drivers/net/ethernet/qlogic/qede/qede_main.c
index 3e38b22..e22e2ef 100644
--- a/drivers/net/ethernet/qlogic/qede/qede_main.c
+++ b/drivers/net/ethernet/qlogic/qede/qede_main.c
@@ -87,6 +87,7 @@ static int qede_probe(struct pci_dev *pdev, const struct 
pci_device_id *id);
 static void qede_remove(struct pci_dev *pdev);
 static int qede_alloc_rx_buffer(struct qede_dev *edev,
struct qede_rx_queue *rxq);
+static void qede_link_update(void *dev, struct qed_link_output *link);
 
 static struct pci_driver qede_pci_driver = {
.name = "qede",
@@ -95,6 +96,12 @@ static struct pci_driver qede_pci_driver = {
.remove = qede_remove,
 };
 
+static struct qed_eth_cb_ops qede_ll_ops = {
+   {
+   .link_update = qede_link_update,
+   },
+};
+
 static int qede_netdev_event(struct notifier_block *this, unsigned long event,
 void *ptr)
 {
@@ -1304,6 +1311,8 @@ static int __qede_probe(struct pci_dev *pdev, u32 
dp_module, u8 dp_level,
 
edev->ops->common->set_id(cdev, edev->ndev->name, DRV_MODULE_VERSION);
 
+   edev->ops->register_ops(cdev, _ll_ops, edev);
+
INIT_DELAYED_WORK(>sp_task, qede_sp_task);
mutex_init(>qede_lock);
 
@@ -2099,6 +2108,7 @@ enum qede_unload_mode {
 
 static void qede_unload(struct qede_dev *edev, enum qede_unload_mode mode)
 {
+   struct qed_link_params link_params;
int rc;
 
DP_INFO(edev, "Starting qede unload\n");
@@ -2110,6 +2120,10 @@ static void qede_unload(struct qede_dev *edev, enum 
qede_unload_mode mode)
netif_tx_disable(edev->ndev);
netif_carrier_off(edev->ndev);
 
+   /* Reset the link */
+   memset(_params, 0, sizeof(link_params));
+   link_params.link_up = false;
+   edev->ops->common->set_link(edev->cdev, _params);
rc = qede_stop_queues(edev);
if (rc) {
qede_sync_free_irqs(edev);
@@ -2140,6 +2154,8 @@ enum qede_load_mode {
 
 static int qede_load(struct qede_dev *edev, enum qede_load_mode mode)
 {
+   struct qed_link_params link_params;
+   struct qed_link_output link_output;
int rc;
 
DP_INFO(edev, "Starting qede load\n");
@@ -2183,6 +2199,17 @@ static int qede_load(struct qede_dev *edev, enum 
qede_load_mode mode)
mutex_lock(>qede_lock);
edev->state = QEDE_STATE_OPEN;
mutex_unlock(>qede_lock);
+
+   /* Ask for link-up using current configuration */
+   memset(_params, 0, sizeof(link_params));
+   link_params.link_up = true;
+   edev->ops->common->set_link(edev->cdev, _params);
+
+   /* Query whether link is already-up */
+   memset(_output, 0, sizeof(link_output));
+   edev->ops->common->get_link(edev->cdev, _output);
+   qede_link_update(edev, _output);
+
DP_INFO(edev, "Ending successfully qede load\n");
 
return 0;
@@ -2223,6 +2250,26 @@ static int qede_close(struct net_device *ndev)
return 0;
 }
 
+static void qede_link_update(void *dev, struct qed_link_output *link)
+{
+   struct qede_dev *edev = dev;
+
+   if (!netif_running(edev->ndev)) {
+   DP_VERBOSE(edev, NETIF_MSG_LINK, "Interface is not running\n");
+   return;
+   }
+
+   if (link->link_up) {
+   DP_NOTICE(edev, "Link is up\n");
+   netif_tx_start_all_queues(edev->ndev);
+   netif_carrier_on(edev->ndev);
+   } else {
+   DP_NOTICE(edev, "Link is down\n");
+   netif_tx_disable(edev->ndev);
+   netif_carrier_off(edev->ndev);
+   }
+}
+
 static int qede_set_mac_addr(struct net_device *ndev, void *p)
 {
struct qede_dev *edev = netdev_priv(ndev);
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next v7 05/10] qede: Add basic network device support

2015-10-21 Thread Yuval Mintz

This patch includes the basic Rx/Tx support for the driver [although
carrier will still never be turned on].
Following this patch the driver registers a network device, initializes
it and prepares it for traffic.

Signed-off-by: Sudarsana Kalluru 
Signed-off-by: Yuval Mintz 
Signed-off-by: Ariel Elior 
---
 drivers/net/ethernet/qlogic/qede/qede.h  |  128 ++
 drivers/net/ethernet/qlogic/qede/qede_main.c | 1807 ++
 2 files changed, 1935 insertions(+)

diff --git a/drivers/net/ethernet/qlogic/qede/qede.h 
b/drivers/net/ethernet/qlogic/qede/qede.h
index 7e2bcfa..424ef4a 100644
--- a/drivers/net/ethernet/qlogic/qede/qede.h
+++ b/drivers/net/ethernet/qlogic/qede/qede.h
@@ -51,6 +51,7 @@ struct qede_dev {
 #define QEDE_MAX_TSS_CNT(edev) ((edev)->dev_info.num_queues * \
 (edev)->dev_info.num_tc)
 
+   struct qede_fastpath*fp_array;
u16 num_rss;
u8  num_tc;
 #define QEDE_RSS_CNT(edev) ((edev)->num_rss)
@@ -58,6 +59,9 @@ struct qede_dev {
 (edev)->num_tc)
 #define QEDE_TSS_IDX(edev, txqidx) ((txqidx) % (edev)->num_rss)
 #define QEDE_TC_IDX(edev, txqidx)  ((txqidx) / (edev)->num_rss)
+#define QEDE_TX_QUEUE(edev, txqidx)\
+   (&(edev)->fp_array[QEDE_TSS_IDX((edev), (txqidx))].txqs[QEDE_TC_IDX( \
+   (edev), (txqidx))])
 
struct qed_int_info int_info;
unsigned char   primary_mac[ETH_ALEN];
@@ -65,9 +69,133 @@ struct qede_dev {
/* Smaller private varaiant of the RTNL lock */
struct mutexqede_lock;
u32 state; /* Protected by qede_lock */
+   u16 rx_buf_size;
+   /* L2 header size + 2*VLANs (8 bytes) + LLC SNAP (8 bytes) */
+#define ETH_OVERHEAD   (ETH_HLEN + 8 + 8)
+   /* Max supported alignment is 256 (8 shift)
+* minimal alignment shift 6 is optimal for 57xxx HW performance
+*/
+#define QEDE_RX_ALIGN_SHIFTmax(6, min(8, L1_CACHE_SHIFT))
+   /* We assume skb_build() uses sizeof(struct skb_shared_info) bytes
+* at the end of skb->data, to avoid wasting a full cache line.
+* This reduces memory use (skb->truesize).
+*/
+#define QEDE_FW_RX_ALIGN_END   \
+   max_t(u64, 1UL << QEDE_RX_ALIGN_SHIFT,  \
+ SKB_DATA_ALIGN(sizeof(struct skb_shared_info)))
+
+   struct qed_update_vport_rss_params  rss_params;
+   u16 q_num_rx_buffers; /* Must be a power of two */
+   u16 q_num_tx_buffers; /* Must be a power of two */
+};
+
+enum QEDE_STATE {
+   QEDE_STATE_CLOSED,
+   QEDE_STATE_OPEN,
+};
+
+#define HILO_U64(hi, lo)   u64)(hi)) << 32) + (lo))
+
+#defineMAX_NUM_TC  8
+#defineMAX_NUM_PRI 8
+
+/* The driver supports the new build_skb() API:
+ * RX ring buffer contains pointer to kmalloc() data only,
+ * skb are built only after the frame was DMA-ed.
+ */
+struct sw_rx_data {
+   u8 *data;
+
+   DEFINE_DMA_UNMAP_ADDR(mapping);
+};
+
+struct qede_rx_queue {
+   __le16  *hw_cons_ptr;
+   struct sw_rx_data   *sw_rx_ring;
+   u16 sw_rx_cons;
+   u16 sw_rx_prod;
+   struct qed_chainrx_bd_ring;
+   struct qed_chainrx_comp_ring;
+   void __iomem*hw_rxq_prod_addr;
+
+   int rx_buf_size;
+
+   u16 num_rx_buffers;
+   u16 rxq_id;
+
+   u64 rx_hw_errors;
+   u64 rx_alloc_errors;
+};
+
+union db_prod {
+   struct eth_db_data data;
+   u32 raw;
+};
+
+struct sw_tx_bd {
+   struct sk_buff *skb;
+   u8 flags;
+/* Set on the first BD descriptor when there is a split BD */
+#define QEDE_TSO_SPLIT_BD  BIT(0)
+};
+
+struct qede_tx_queue {
+   int index; /* Queue index */
+   __le16  *hw_cons_ptr;
+   struct sw_tx_bd *sw_tx_ring;
+   u16 sw_tx_cons;
+   u16 sw_tx_prod;
+   struct qed_chaintx_pbl;
+   void __iomem*doorbell_addr;
+   union db_prod   tx_db;
+
+   u16 num_tx_buffers;
+};
+
+#define BD_UNMAP_ADDR(bd)  HILO_U64(le32_to_cpu((bd)->addr.hi), \
+le32_to_cpu((bd)->addr.lo))
+#define BD_SET_UNMAP_ADDR_LEN(bd, maddr, len)  \
+   do {\

[PATCH net-next v7 02/10] qed: Add basic L2 interface

2015-10-21 Thread Yuval Mintz

From: Manish Chopra 

This patch adds a public API for a network driver to work on top of QED.
The interface itself is very minimal - it's mostly infrastructure, as the
only content it has after this patch is a query for HW-based information
required for the creation of a network interface [I.e., no actual
protocol-specific configurations are supported].

Signed-off-by: Manish Chopra 
Signed-off-by: Yuval Mintz 
Signed-off-by: Ariel Elior 
---
 drivers/net/ethernet/qlogic/qed/Makefile  |   2 +-
 drivers/net/ethernet/qlogic/qed/qed.h |  14 ++
 drivers/net/ethernet/qlogic/qed/qed_dev.c |  62 +++
 drivers/net/ethernet/qlogic/qed/qed_hsi.h |   1 +
 drivers/net/ethernet/qlogic/qed/qed_l2.c  |  87 ++
 include/linux/qed/eth_common.h| 278 ++
 include/linux/qed/qed_eth_if.h|  38 
 7 files changed, 481 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/ethernet/qlogic/qed/qed_l2.c
 create mode 100644 include/linux/qed/eth_common.h
 create mode 100644 include/linux/qed/qed_eth_if.h

diff --git a/drivers/net/ethernet/qlogic/qed/Makefile 
b/drivers/net/ethernet/qlogic/qed/Makefile
index 5bbe0c7..dbe6938 100644
--- a/drivers/net/ethernet/qlogic/qed/Makefile
+++ b/drivers/net/ethernet/qlogic/qed/Makefile
@@ -1,3 +1,3 @@
 obj-$(CONFIG_QED) := qed.o
 
-qed-y := qed_cxt.o qed_dev.o qed_hw.o qed_init_fw_funcs.o qed_init_ops.o 
qed_int.o qed_main.o qed_mcp.o qed_sp_commands.o qed_spq.o
+qed-y := qed_cxt.o qed_dev.o qed_hw.o qed_init_fw_funcs.o qed_init_ops.o 
qed_int.o qed_l2.o qed_main.o qed_mcp.o qed_sp_commands.o qed_spq.o
diff --git a/drivers/net/ethernet/qlogic/qed/qed.h 
b/drivers/net/ethernet/qlogic/qed/qed.h
index 727f76a..e82fef1 100644
--- a/drivers/net/ethernet/qlogic/qed/qed.h
+++ b/drivers/net/ethernet/qlogic/qed/qed.h
@@ -24,6 +24,7 @@
 #include 
 #include "qed_hsi.h"
 
+extern const struct qed_common_ops qed_common_ops_pass;
 #define DRV_MODULE_VERSION "8.4.0.0"
 
 #define MAX_HWFNS_PER_DEVICE(4)
@@ -90,13 +91,22 @@ struct qed_qm_iids {
 
 enum QED_RESOURCES {
QED_SB,
+   QED_L2_QUEUE,
QED_VPORT,
+   QED_RSS_ENG,
QED_PQ,
QED_RL,
+   QED_MAC,
+   QED_VLAN,
QED_ILT,
QED_MAX_RESC,
 };
 
+enum QED_FEATURE {
+   QED_PF_L2_QUE,
+   QED_MAX_FEATURES,
+};
+
 struct qed_hw_info {
/* PCI personality */
enum qed_pci_personalitypersonality;
@@ -104,6 +114,7 @@ struct qed_hw_info {
/* Resource Allocation scheme results */
u32 resc_start[QED_MAX_RESC];
u32 resc_num[QED_MAX_RESC];
+   u32 feat_num[QED_MAX_FEATURES];
 
 #define RESC_START(_p_hwfn, resc) ((_p_hwfn)->hw_info.resc_start[resc])
 #define RESC_NUM(_p_hwfn, resc) ((_p_hwfn)->hw_info.resc_num[resc])
@@ -265,6 +276,9 @@ struct qed_hwfn {
 
struct qed_mcp_info *mcp_info;
 
+   struct qed_hw_cid_data  *p_tx_cids;
+   struct qed_hw_cid_data  *p_rx_cids;
+
struct qed_dmae_infodmae_info;
 
/* QM init */
diff --git a/drivers/net/ethernet/qlogic/qed/qed_dev.c 
b/drivers/net/ethernet/qlogic/qed/qed_dev.c
index 4ba7a2d..5ed60f5 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_dev.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_dev.c
@@ -92,6 +92,15 @@ void qed_resc_free(struct qed_dev *cdev)
for_each_hwfn(cdev, i) {
struct qed_hwfn *p_hwfn = >hwfns[i];
 
+   kfree(p_hwfn->p_tx_cids);
+   p_hwfn->p_tx_cids = NULL;
+   kfree(p_hwfn->p_rx_cids);
+   p_hwfn->p_rx_cids = NULL;
+   }
+
+   for_each_hwfn(cdev, i) {
+   struct qed_hwfn *p_hwfn = >hwfns[i];
+
qed_cxt_mngr_free(p_hwfn);
qed_qm_info_free(p_hwfn);
qed_spq_free(p_hwfn);
@@ -202,6 +211,29 @@ int qed_resc_alloc(struct qed_dev *cdev)
if (!cdev->fw_data)
return -ENOMEM;
 
+   /* Allocate Memory for the Queue->CID mapping */
+   for_each_hwfn(cdev, i) {
+   struct qed_hwfn *p_hwfn = >hwfns[i];
+   int tx_size = sizeof(struct qed_hw_cid_data) *
+ RESC_NUM(p_hwfn, QED_L2_QUEUE);
+   int rx_size = sizeof(struct qed_hw_cid_data) *
+ RESC_NUM(p_hwfn, QED_L2_QUEUE);
+
+   p_hwfn->p_tx_cids = kzalloc(tx_size, GFP_KERNEL);
+   if (!p_hwfn->p_tx_cids) {
+   DP_NOTICE(p_hwfn,
+ "Failed to allocate memory for Tx Cids\n");
+   goto alloc_err;
+   }
+
+   p_hwfn->p_rx_cids = kzalloc(rx_size, GFP_KERNEL);
+   if (!p_hwfn->p_rx_cids) {
+   DP_NOTICE(p_hwfn,
+

[PATCH net-next v7 03/10] qede: Add basic Network driver

2015-10-21 Thread Yuval Mintz

The Qlogic Everest Driver for Ethernet is the Ethernet specific module for
QL4xxx ethernet products by Qlogic.

This patch adds a very minimal PCI driver, one that doesn't yet register
a network device, but one that does interact with qed and does a basic
initialization of the HW.

Signed-off-by: Yuval Mintz 
Signed-off-by: Ariel Elior 
---
 drivers/net/ethernet/qlogic/Kconfig  |   5 +
 drivers/net/ethernet/qlogic/Makefile |   1 +
 drivers/net/ethernet/qlogic/qede/Makefile|   3 +
 drivers/net/ethernet/qlogic/qede/qede.h  |  73 ++
 drivers/net/ethernet/qlogic/qede/qede_main.c | 354 +++
 5 files changed, 436 insertions(+)
 create mode 100644 drivers/net/ethernet/qlogic/qede/Makefile
 create mode 100644 drivers/net/ethernet/qlogic/qede/qede.h
 create mode 100644 drivers/net/ethernet/qlogic/qede/qede_main.c

diff --git a/drivers/net/ethernet/qlogic/Kconfig 
b/drivers/net/ethernet/qlogic/Kconfig
index 58c3fb3..30a6f24 100644
--- a/drivers/net/ethernet/qlogic/Kconfig
+++ b/drivers/net/ethernet/qlogic/Kconfig
@@ -97,4 +97,9 @@ config QED
---help---
  This enables the support for ...
 
+config QEDE
+   tristate "QLogic QED 25/40/100Gb Ethernet NIC"
+   depends on QED
+   ---help---
+ This enables the support for ...
 endif # NET_VENDOR_QLOGIC
diff --git a/drivers/net/ethernet/qlogic/Makefile 
b/drivers/net/ethernet/qlogic/Makefile
index 7600138..cee90e0 100644
--- a/drivers/net/ethernet/qlogic/Makefile
+++ b/drivers/net/ethernet/qlogic/Makefile
@@ -7,3 +7,4 @@ obj-$(CONFIG_QLCNIC) += qlcnic/
 obj-$(CONFIG_QLGE) += qlge/
 obj-$(CONFIG_NETXEN_NIC) += netxen/
 obj-$(CONFIG_QED) += qed/
+obj-$(CONFIG_QEDE)+= qede/
diff --git a/drivers/net/ethernet/qlogic/qede/Makefile 
b/drivers/net/ethernet/qlogic/qede/Makefile
new file mode 100644
index 000..bedfe9f
--- /dev/null
+++ b/drivers/net/ethernet/qlogic/qede/Makefile
@@ -0,0 +1,3 @@
+obj-$(CONFIG_QEDE) := qede.o
+
+qede-y := qede_main.o
diff --git a/drivers/net/ethernet/qlogic/qede/qede.h 
b/drivers/net/ethernet/qlogic/qede/qede.h
new file mode 100644
index 000..7e2bcfa
--- /dev/null
+++ b/drivers/net/ethernet/qlogic/qede/qede.h
@@ -0,0 +1,73 @@
+/* QLogic qede NIC Driver
+* Copyright (c) 2015 QLogic Corporation
+*
+* This software is available under the terms of the GNU General Public License
+* (GPL) Version 2, available from the file COPYING in the main directory of
+* this source tree.
+*/
+
+#ifndef _QEDE_H_
+#define _QEDE_H_
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define QEDE_MAJOR_VERSION 8
+#define QEDE_MINOR_VERSION 4
+#define QEDE_REVISION_VERSION  0
+#define QEDE_ENGINEERING_VERSION   0
+#define DRV_MODULE_VERSION __stringify(QEDE_MAJOR_VERSION) "." \
+   __stringify(QEDE_MINOR_VERSION) "." \
+   __stringify(QEDE_REVISION_VERSION) "."  \
+   __stringify(QEDE_ENGINEERING_VERSION)
+
+#define QEDE_ETH_INTERFACE_VERSION 300
+
+#define DRV_MODULE_SYM qede
+
+struct qede_dev {
+   struct qed_dev  *cdev;
+   struct net_device   *ndev;
+   struct pci_dev  *pdev;
+
+   u32 dp_module;
+   u8  dp_level;
+
+   const struct qed_eth_ops*ops;
+
+   struct qed_dev_eth_info dev_info;
+#define QEDE_MAX_RSS_CNT(edev) ((edev)->dev_info.num_queues)
+#define QEDE_MAX_TSS_CNT(edev) ((edev)->dev_info.num_queues * \
+(edev)->dev_info.num_tc)
+
+   u16 num_rss;
+   u8  num_tc;
+#define QEDE_RSS_CNT(edev) ((edev)->num_rss)
+#define QEDE_TSS_CNT(edev) ((edev)->num_rss *  \
+(edev)->num_tc)
+#define QEDE_TSS_IDX(edev, txqidx) ((txqidx) % (edev)->num_rss)
+#define QEDE_TC_IDX(edev, txqidx)  ((txqidx) / (edev)->num_rss)
+
+   struct qed_int_info int_info;
+   unsigned char   primary_mac[ETH_ALEN];
+
+   /* Smaller private varaiant of the RTNL lock */
+   struct mutexqede_lock;
+   u32 state; /* Protected by qede_lock */
+};
+
+/* Debug print definitions */
+#define DP_NAME(edev) ((edev)->ndev->name)
+
+#endif /* _QEDE_H_ */
diff --git a/drivers/net/ethernet/qlogic/qede/qede_main.c 
b/drivers/net/ethernet/qlogic/qede/qede_main.c
new file mode 100644
index 000..35065dc
--- /dev/null
+++ b/drivers/net/ethernet/qlogic/qede/qede_main.c
@@ -0,0 +1,354 @@
+/* QLogic qede NIC Driver
+* Copyright (c) 2015 QLogic Corporation
+*
+* This software is available under the terms of the GNU General Public License
+* (GPL) Version 2, available from

[PATCH net-next v7 04/10] qed: Add slowpath L2 support

2015-10-21 Thread Yuval Mintz

From: Manish Chopra 

This patch adds to the qed the support to configure various L2 elements,
such as channels and basic filtering conditions.
It also enhances its public API to allow qede to later utilize this
functionality.

Signed-off-by: Manish Chopra 
Signed-off-by: Yuval Mintz 
Signed-off-by: Ariel Elior 
---
 drivers/net/ethernet/qlogic/qed/qed_dev.c |  114 ++
 drivers/net/ethernet/qlogic/qed/qed_dev_api.h |   58 +
 drivers/net/ethernet/qlogic/qed/qed_hsi.h |  294 +
 drivers/net/ethernet/qlogic/qed/qed_l2.c  | 1613 +
 drivers/net/ethernet/qlogic/qed/qed_main.c|   10 +
 drivers/net/ethernet/qlogic/qed/qed_mcp.c |   16 +
 drivers/net/ethernet/qlogic/qed/qed_mcp.h |   13 +
 drivers/net/ethernet/qlogic/qed/qed_sp.h  |   27 +
 drivers/net/ethernet/qlogic/qed/qed_spq.c |   29 +
 include/linux/qed/qed_eth_if.h|  120 ++
 10 files changed, 2294 insertions(+)

diff --git a/drivers/net/ethernet/qlogic/qed/qed_dev.c 
b/drivers/net/ethernet/qlogic/qed/qed_dev.c
index 5ed60f5..c94745a 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_dev.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_dev.c
@@ -799,6 +799,60 @@ int qed_hw_stop(struct qed_dev *cdev)
return rc;
 }
 
+void qed_hw_stop_fastpath(struct qed_dev *cdev)
+{
+   int i, j;
+
+   for_each_hwfn(cdev, j) {
+   struct qed_hwfn *p_hwfn = >hwfns[j];
+   struct qed_ptt *p_ptt   = p_hwfn->p_main_ptt;
+
+   DP_VERBOSE(p_hwfn,
+  NETIF_MSG_IFDOWN,
+  "Shutting down the fastpath\n");
+
+   qed_wr(p_hwfn, p_ptt,
+  NIG_REG_RX_LLH_BRB_GATE_DNTFWD_PERPF, 0x1);
+
+   qed_wr(p_hwfn, p_ptt, PRS_REG_SEARCH_TCP, 0x0);
+   qed_wr(p_hwfn, p_ptt, PRS_REG_SEARCH_UDP, 0x0);
+   qed_wr(p_hwfn, p_ptt, PRS_REG_SEARCH_FCOE, 0x0);
+   qed_wr(p_hwfn, p_ptt, PRS_REG_SEARCH_ROCE, 0x0);
+   qed_wr(p_hwfn, p_ptt, PRS_REG_SEARCH_OPENFLOW, 0x0);
+
+   qed_wr(p_hwfn, p_ptt, TM_REG_PF_ENABLE_CONN, 0x0);
+   qed_wr(p_hwfn, p_ptt, TM_REG_PF_ENABLE_TASK, 0x0);
+   for (i = 0; i < QED_HW_STOP_RETRY_LIMIT; i++) {
+   if ((!qed_rd(p_hwfn, p_ptt,
+TM_REG_PF_SCAN_ACTIVE_CONN)) &&
+   (!qed_rd(p_hwfn, p_ptt,
+TM_REG_PF_SCAN_ACTIVE_TASK)))
+   break;
+
+   usleep_range(1000, 2000);
+   }
+   if (i == QED_HW_STOP_RETRY_LIMIT)
+   DP_NOTICE(p_hwfn,
+ "Timers linear scans are not over [Connection 
%02x Tasks %02x]\n",
+ (u8)qed_rd(p_hwfn, p_ptt,
+TM_REG_PF_SCAN_ACTIVE_CONN),
+ (u8)qed_rd(p_hwfn, p_ptt,
+TM_REG_PF_SCAN_ACTIVE_TASK));
+
+   qed_int_igu_init_pure_rt(p_hwfn, p_ptt, false, false);
+
+   /* Need to wait 1ms to guarantee SBs are cleared */
+   usleep_range(1000, 2000);
+   }
+}
+
+void qed_hw_start_fastpath(struct qed_hwfn *p_hwfn)
+{
+   /* Re-open incoming traffic */
+   qed_wr(p_hwfn, p_hwfn->p_main_ptt,
+  NIG_REG_RX_LLH_BRB_GATE_DNTFWD_PERPF, 0x0);
+}
+
 static int qed_reg_assert(struct qed_hwfn *hwfn,
  struct qed_ptt *ptt, u32 reg,
  bool expected)
@@ -1348,3 +1402,63 @@ void qed_chain_free(struct qed_dev *cdev,
  p_chain->p_virt_addr,
  p_chain->p_phys_addr);
 }
+
+int qed_fw_l2_queue(struct qed_hwfn *p_hwfn,
+   u16 src_id, u16 *dst_id)
+{
+   if (src_id >= RESC_NUM(p_hwfn, QED_L2_QUEUE)) {
+   u16 min, max;
+
+   min = (u16)RESC_START(p_hwfn, QED_L2_QUEUE);
+   max = min + RESC_NUM(p_hwfn, QED_L2_QUEUE);
+   DP_NOTICE(p_hwfn,
+ "l2_queue id [%d] is not valid, available indices [%d 
- %d]\n",
+ src_id, min, max);
+
+   return -EINVAL;
+   }
+
+   *dst_id = RESC_START(p_hwfn, QED_L2_QUEUE) + src_id;
+
+   return 0;
+}
+
+int qed_fw_vport(struct qed_hwfn *p_hwfn,
+u8 src_id, u8 *dst_id)
+{
+   if (src_id >= RESC_NUM(p_hwfn, QED_VPORT)) {
+   u8 min, max;
+
+   min = (u8)RESC_START(p_hwfn, QED_VPORT);
+   max = min + RESC_NUM(p_hwfn, QED_VPORT);
+   DP_NOTICE(p_hwfn,
+ "vport id [%d] is not valid, available indices [%d - 
%d]\n",
+ src_id, min, max);
+
+   return -EINVAL;
+   }
+
+

Re: [PATCH net-next v4 0/2] mpls: multipath support

2015-10-21 Thread Eric W. Biederman

Roopa Prabhu  writes:

> From: Roopa Prabhu 
>
> This patch adds support for MPLS multipath routes.
>
> Includes following changes to support multipath:
> - splits struct mpls_route into 'struct mpls_route + struct mpls_nh'.
>
> - struct mpls_nh represents a mpls nexthop label forwarding entry
>
> - Adds support to parse/fill RTA_MULTIPATH netlink attribute for
> multipath routes similar to ipv4/v6 fib
>
> - In the process of restructuring, this patch also consistently changes all
> labels to u8
>
> $ip -f mpls route add 100 nexthop as 200 via inet 10.1.1.2 dev swp1 \
>   nexthop as 700 via inet 10.1.1.6 dev swp2 \
>   nexthop as 800 via inet 40.1.1.2 dev swp3
>
> $ip  -f mpls route show
> 100 
>   nexthop as to 200 via inet 10.1.1.2  dev swp1
>   nexthop as to 700 via inet 10.1.1.6  dev swp2
>   nexthop as to 800 via inet 40.1.1.2  dev swp3
>
> Roopa Prabhu (1):
>   mpls: multipath support
>
> Robert Shearman (1):
>   mpls: flow-based multipath selection
>
> Signed-off-by: Roopa Prabhu 
>
> 
> v2:
>   - Incorporate some feedback from Robert:
>   use dynamic allocation (list) instead of static allocation
>   for nexthops
> v3:
>   - Move back to arrays (same as v1), also suggested by Eric Biederman
>
> v4:
>   - address a few comments from Eric Biederman
>   Plan to address the following pending comments in incremental patches 
> after this
>   infrastructure changes go in.
>   - Move VIA size to 16 bytes
>   - use ipv6 flow label in ecmp calculations
>   - dead route handling during multipath route selection (I had planned 
> this in
>   an incremental patch initially).

I don't see anything problematic in the patches the worst
I found is dead code and we can delete that later so
for purposes of moving forward I say:

Acked-by: "Eric W. Biederman" 

That said we really need dead path handling.  Without handling paths
that go dead this functionality really is pretty much broken.  So if you
can't get that by the merge window we will need to apply a patch to
disable processing of the RTA_MULTIPATH netlink attribute.

Eric
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] net: ipv6: Dont add RT6_LOOKUP_F_IFACE flag if saddr set

2015-10-21 Thread David Ahern

741a11d9e410 ("net: ipv6: Add RT6_LOOKUP_F_IFACE flag if oif is set")
adds the RT6_LOOKUP_F_IFACE flag to make device index mismatch fatal if
oif is given. Hajime reported that this change breaks the Mobile IPv6
use case that wants to force the message through one interface yet use
the source address from another interface. Handle this case by only
adding the flag if oif is set and saddr is not set.

Fixes: 741a11d9e410 ("net: ipv6: Add RT6_LOOKUP_F_IFACE flag if oif is set")
Cc: Hajime Tazaki 
Signed-off-by: David Ahern 
---
This is needed for 4.3.

 net/ipv6/route.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index d0619632723a..2701cb3d88e9 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -1171,6 +1171,7 @@ struct dst_entry *ip6_route_output(struct net *net, const 
struct sock *sk,
 {
struct dst_entry *dst;
int flags = 0;
+   bool any_src;
 
dst = l3mdev_rt6_dst_by_oif(net, fl6);
if (dst)
@@ -1178,11 +1179,12 @@ struct dst_entry *ip6_route_output(struct net *net, 
const struct sock *sk,
 
fl6->flowi6_iif = LOOPBACK_IFINDEX;
 
+   any_src = ipv6_addr_any(>saddr);
if ((sk && sk->sk_bound_dev_if) || rt6_need_strict(>daddr) ||
-   fl6->flowi6_oif)
+   (fl6->flowi6_oif && any_src))
flags |= RT6_LOOKUP_F_IFACE;
 
-   if (!ipv6_addr_any(>saddr))
+   if (!any_src)
flags |= RT6_LOOKUP_F_HAS_SADDR;
else if (sk)
flags |= rt6_srcprefs2flags(inet6_sk(sk)->srcprefs);
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 net-next] xfrm: Fix unaligned access to stats in copy_to_user_state()

2015-10-21 Thread Sowmini Varadhan


On sparc, deleting established SAs (e.g., by restarting ipsec)
results in unaligned access messages via xfrm_del_sa -> 
km_state_notify -> xfrm_send_state_notify().

Even though struct xfrm_usersa_info is aligned on 8-byte boundaries,
netlink attributes are fundamentally only 4 byte aligned, and this
cannot be changed for nla_data() that is passed up to userspace.
As a result, the put_unaligned() macro needs to be used to
set up potentially unaligned fields such as the xfrm_stats in
copy_to_user_state()

Signed-off-by: Sowmini Varadhan 
---
v2: review comment from thread: cannot use PTR_ALIGN as this would break
userspace assumptions about 4 byte alignment. Use *_unaligned() macros
as needed, instead.

 net/xfrm/xfrm_user.c |5 -
 1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index a8de9e3..639e0d5 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -31,6 +31,7 @@
 #if IS_ENABLED(CONFIG_IPV6)
 #include 
 #endif
+#include 
 
 static int verify_one_alg(struct nlattr **attrs, enum xfrm_attr_type_t type)
 {
@@ -728,7 +729,9 @@ static void copy_to_user_state(struct xfrm_state *x, struct 
xfrm_usersa_info *p)
memcpy(>sel, >sel, sizeof(p->sel));
memcpy(>lft, >lft, sizeof(p->lft));
memcpy(>curlft, >curlft, sizeof(p->curlft));
-   memcpy(>stats, >stats, sizeof(p->stats));
+   put_unaligned(x->stats.replay_window, >stats.replay_window);
+   put_unaligned(x->stats.replay, >stats.replay);
+   put_unaligned(x->stats.integrity_failed, >stats.integrity_failed);
memcpy(>saddr, >props.saddr, sizeof(p->saddr));
p->mode = x->props.mode;
p->replay_window = x->props.replay_window;
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] net: dsa: mv88e6060: Fix false positive lockdep splat

2015-10-21 Thread Andrew Lunn

On Wed, Oct 21, 2015 at 05:37:45PM +0200, Neil Armstrong wrote:
> Like the change made for mv88e6xxx, use mutex_lock_nested() to avoid
> lockdep to give false positives because of nested MDIO busses.

Hi Neil

We now have three instances of this, since mdio-mux.c has the same
code. Maybe now would be a good time to refactor this code into
mdiobus_read_nested() and mdiobus_write_nested() in mdio_bus.c?  At
the same time, add BUG_ON(in_interrupt()) similar to the non-nested
versions?

Andrew
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC Patch 11/12] IXGBEVF: Migrate VF statistic data

2015-10-21 Thread Lan Tianyu

VF statistic regs are read-only and can't be migrated via writing back
directly.

Currently, statistic data returned to user space by the driver is not equal
to value of statistic regs. VF driver records value of statistic regs as base 
data
when net interface is up or open, calculate increased count of regs during
last period of online service and added it to saved_reset data. When user
space collects statistic data, VF driver returns result of
"current - base + saved_reset". "Current" is reg value at that point.

Restoring net function after migration just likes net interface is up or open.
Call existed function to update base and saved_reset data to keep statistic
data continual during migration.

Signed-off-by: Lan Tianyu 
---
 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c 
b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
index 04b6ce7..d22160f 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
@@ -3005,6 +3005,7 @@ int ixgbevf_live_mg(struct ixgbevf_adapter *adapter)
return 0;
 
del_timer_sync(>service_timer);
+   ixgbevf_update_stats(adapter);
pr_info("migration start\n");
migration_status = MIGRATION_IN_PROGRESS; 
 
@@ -3017,6 +3018,8 @@ int ixgbevf_live_mg(struct ixgbevf_adapter *adapter)
return 1;
 
ixgbevf_restore_state(adapter);
+   ixgbevf_save_reset_stats(adapter);
+   ixgbevf_init_last_counter_stats(adapter);
migration_status = MIGRATION_COMPLETED;
pr_info("migration end\n");
return 0;
-- 
1.8.4.rc0.1.g8f6a3e5.dirty

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC Patch 01/12] PCI: Add virtfn_index for struct pci_device

2015-10-21 Thread Lan Tianyu

Add "virtfn_index" member in the struct pci_device to record VF sequence
of PF. This will be used in the VF sysfs node handle.

Signed-off-by: Lan Tianyu 
---
 drivers/pci/iov.c   | 1 +
 include/linux/pci.h | 1 +
 2 files changed, 2 insertions(+)

diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
index ee0ebff..065b6bb 100644
--- a/drivers/pci/iov.c
+++ b/drivers/pci/iov.c
@@ -136,6 +136,7 @@ static int virtfn_add(struct pci_dev *dev, int id, int 
reset)
virtfn->physfn = pci_dev_get(dev);
virtfn->is_virtfn = 1;
virtfn->multifunction = 0;
+   virtfn->virtfn_index = id;
 
for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {
res = >resource[i + PCI_IOV_RESOURCES];
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 353db8d..85c5531 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -356,6 +356,7 @@ struct pci_dev {
unsigned intio_window_1k:1; /* Intel P2P bridge 1K I/O windows */
unsigned intirq_managed:1;
pci_dev_flags_t dev_flags;
+   unsigned intvirtfn_index;
atomic_tenable_cnt; /* pci_enable_device has been called */
 
u32 saved_config_space[16]; /* config space saved at 
suspend time */
-- 
1.8.4.rc0.1.g8f6a3e5.dirty

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next v2 2/3] switchdev: fix: pass correct obj size when deferring obj add/del

2015-10-21 Thread sfeldma

From: Scott Feldman 

Fixes: 4d429c5dd ("switchdev: introduce possibility to defer obj_add/del")
Signed-off-by: Scott Feldman 
---
v1->v2: use correct "Fixes" tag, use common func to calc obj size for add/del

 net/switchdev/switchdev.c |   19 +--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c
index 56d8479..bff8e2b 100644
--- a/net/switchdev/switchdev.c
+++ b/net/switchdev/switchdev.c
@@ -336,6 +336,21 @@ int switchdev_port_attr_set(struct net_device *dev,
 }
 EXPORT_SYMBOL_GPL(switchdev_port_attr_set);
 
+static size_t switchdev_obj_size(const struct switchdev_obj *obj)
+{
+   switch (obj->id) {
+   case SWITCHDEV_OBJ_ID_PORT_VLAN:
+   return sizeof(struct switchdev_obj_port_vlan);
+   case SWITCHDEV_OBJ_ID_IPV4_FIB:
+   return sizeof(struct switchdev_obj_ipv4_fib);
+   case SWITCHDEV_OBJ_ID_PORT_FDB:
+   return sizeof(struct switchdev_obj_port_fdb);
+   default:
+   BUG();
+   }
+   return 0;
+}
+
 static int __switchdev_port_obj_add(struct net_device *dev,
const struct switchdev_obj *obj,
struct switchdev_trans *trans)
@@ -421,7 +436,7 @@ static void switchdev_port_obj_add_deferred(struct 
net_device *dev,
 static int switchdev_port_obj_add_defer(struct net_device *dev,
const struct switchdev_obj *obj)
 {
-   return switchdev_deferred_enqueue(dev, obj, sizeof(*obj),
+   return switchdev_deferred_enqueue(dev, obj, switchdev_obj_size(obj),
  switchdev_port_obj_add_deferred);
 }
 
@@ -489,7 +504,7 @@ static void switchdev_port_obj_del_deferred(struct 
net_device *dev,
 static int switchdev_port_obj_del_defer(struct net_device *dev,
const struct switchdev_obj *obj)
 {
-   return switchdev_deferred_enqueue(dev, obj, sizeof(*obj),
+   return switchdev_deferred_enqueue(dev, obj, switchdev_obj_size(obj),
  switchdev_port_obj_del_deferred);
 }
 
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next v2 1/3] switchdev: fix: erasing too much of vlan obj when handling multiple vlan specs

2015-10-21 Thread sfeldma

From: Scott Feldman 

When adding vlans with multiple IFLA_BRIDGE_VLAN_INFO attrs set in AFSPEC,
we would wipe the vlan obj struct after the first IFLA_BRIDGE_VLAN_INFO.
Fix this by only clearing what's necessary on each IFLA_BRIDGE_VLAN_INFO
iteration.

Fixes: 9e8f4a54 ("switchdev: push object ID back to object structure")
Signed-off-by: Scott Feldman 
Acked-by: Jiri Pirko 
---
v1->v2: add Jiri's Acked-by

 net/switchdev/switchdev.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c
index 73e3895..56d8479 100644
--- a/net/switchdev/switchdev.c
+++ b/net/switchdev/switchdev.c
@@ -863,7 +863,7 @@ static int switchdev_port_br_afspec(struct net_device *dev,
err = f(dev, );
if (err)
return err;
-   memset(, 0, sizeof(vlan));
+   vlan.vid_begin = 0;
} else {
if (vlan.vid_begin)
return -EINVAL;
@@ -872,7 +872,7 @@ static int switchdev_port_br_afspec(struct net_device *dev,
err = f(dev, );
if (err)
return err;
-   memset(, 0, sizeof(vlan));
+   vlan.vid_begin = 0;
}
}
 
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next v2 3/3] switchdev: split switchdev_attr into individual structs

2015-10-21 Thread sfeldma

From: Scott Feldman 

This was already done for switchdev_objs.   Changing switchdev_attrs to new
style makes switchdev API consistent for both attrs and objs.

No functional changes here.

Signed-off-by: Scott Feldman 
Acked-by: Jiri Pirko 
---
v1->v2: add Jiri's Acked-by

 .../ethernet/mellanox/mlxsw/spectrum_switchdev.c   |   24 --
 drivers/net/ethernet/mellanox/mlxsw/switchx2.c |7 +-
 drivers/net/ethernet/rocker/rocker.c   |   23 --
 include/net/switchdev.h|   51 +++--
 net/bridge/br_stp.c|   24 +++---
 net/core/net-sysfs.c   |   14 ++--
 net/core/rtnetlink.c   |   14 ++--
 net/dsa/slave.c|   10 ++-
 net/switchdev/switchdev.c  |   77 +---
 9 files changed, 172 insertions(+), 72 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c 
b/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
index c39b7a1..efa1aa8 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
@@ -56,15 +56,19 @@ static int mlxsw_sp_port_attr_get(struct net_device *dev,
 {
struct mlxsw_sp_port *mlxsw_sp_port = netdev_priv(dev);
struct mlxsw_sp *mlxsw_sp = mlxsw_sp_port->mlxsw_sp;
+   struct switchdev_attr_port_parent_id *parent_id;
+   struct switchdev_attr_port_bridge_flags *brport_flags;
 
switch (attr->id) {
case SWITCHDEV_ATTR_ID_PORT_PARENT_ID:
-   attr->u.ppid.id_len = sizeof(mlxsw_sp->base_mac);
-   memcpy(>u.ppid.id, _sp->base_mac,
-  attr->u.ppid.id_len);
+   parent_id = SWITCHDEV_ATTR_PORT_PARENT_ID(attr);
+   parent_id->ppid.id_len = sizeof(mlxsw_sp->base_mac);
+   memcpy(_id->ppid.id, _sp->base_mac,
+  parent_id->ppid.id_len);
break;
case SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS:
-   attr->u.brport_flags =
+   brport_flags = SWITCHDEV_ATTR_PORT_BRIDGE_FLAGS(attr);
+   brport_flags->brport_flags =
(mlxsw_sp_port->learning ? BR_LEARNING : 0) |
(mlxsw_sp_port->learning_sync ? BR_LEARNING_SYNC : 0);
break;
@@ -166,20 +170,26 @@ static int mlxsw_sp_port_attr_set(struct net_device *dev,
  struct switchdev_trans *trans)
 {
struct mlxsw_sp_port *mlxsw_sp_port = netdev_priv(dev);
+   struct switchdev_attr_port_stp_state *stp_state;
+   struct switchdev_attr_port_bridge_flags *brport_flags;
+   struct switchdev_attr_bridge_ageing_time *ageing_time;
int err = 0;
 
switch (attr->id) {
case SWITCHDEV_ATTR_ID_PORT_STP_STATE:
+   stp_state = SWITCHDEV_ATTR_PORT_STP_STATE(attr);
err = mlxsw_sp_port_attr_stp_state_set(mlxsw_sp_port, trans,
-  attr->u.stp_state);
+  stp_state->state);
break;
case SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS:
+   brport_flags = SWITCHDEV_ATTR_PORT_BRIDGE_FLAGS(attr);
err = mlxsw_sp_port_attr_br_flags_set(mlxsw_sp_port, trans,
- attr->u.brport_flags);
+ 
brport_flags->brport_flags);
break;
case SWITCHDEV_ATTR_ID_BRIDGE_AGEING_TIME:
+   ageing_time = SWITCHDEV_ATTR_BRIDGE_AGEING_TIME(attr);
err = mlxsw_sp_port_attr_br_ageing_set(mlxsw_sp_port, trans,
-  attr->u.ageing_time);
+  
ageing_time->ageing_time);
break;
default:
err = -EOPNOTSUPP;
diff --git a/drivers/net/ethernet/mellanox/mlxsw/switchx2.c 
b/drivers/net/ethernet/mellanox/mlxsw/switchx2.c
index 2fd2279..edabc82 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/switchx2.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/switchx2.c
@@ -864,11 +864,14 @@ static int mlxsw_sx_port_attr_get(struct net_device *dev,
 {
struct mlxsw_sx_port *mlxsw_sx_port = netdev_priv(dev);
struct mlxsw_sx *mlxsw_sx = mlxsw_sx_port->mlxsw_sx;
+   struct switchdev_attr_port_parent_id *parent_id;
 
switch (attr->id) {
case SWITCHDEV_ATTR_ID_PORT_PARENT_ID:
-   attr->u.ppid.id_len = sizeof(mlxsw_sx->hw_id);
-   memcpy(>u.ppid.id, _sx->hw_id, attr->u.ppid.id_len);
+   parent_id = SWITCHDEV_ATTR_PORT_PARENT_ID(attr);
+   parent_id->ppid.id_len = sizeof(mlxsw_sx->hw_id);
+   memcpy(_id->ppid.id, _sx->hw_id,
+

Re: [PATCH V5 1/1] bpf: control events stored in PERF_EVENT_ARRAY maps trace data output when perf sampling

2015-10-21 Thread Peter Zijlstra

On Wed, Oct 21, 2015 at 11:06:47PM +0800, pi3orama wrote:
> > So explain; how does this eBPF stuff work.
> 
> I think I get your point this time, and let me explain the eBPF stuff to you.
> 
> You are aware that BPF programmer can break the system in this way:
> 
> A=get_non_local_perf_event()
> perf_event_read_local(A)
> BOOM!
> 
> However the above logic is impossible because BPF program can't work this
> way.
> 
> First of all, it is impossible for a BPF program directly invoke a
> kernel function.  Doesn't like kernel module, BPF program can only
> invoke functions designed for them, like what this patch does. So the
> ability of BPF programs is strictly restricted by kernel. If we don't
> allow BPF program call perf_event_read_local() across core, we can
> check this and return error in function we provide for them.
> 
> Second: there's no way for a BPF program directly access a perf event.
> All perf events have to be wrapped by a map and be accessed by BPF
> functions described above. We don't allow BPF program fetch array
> element from that map. So pointers of perf event is safely protected
> from BPF program.
> 
> In summary, your either-or logic doesn't hold in BPF world. A BPF
> program can only access perf event in a highly restricted way. We
> don't allow it calling perf_event_read_local() across core, so it
> can't.

Urgh, that's still horridly inconsistent. Can we please come up with a
consistent interface to perf?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 05/15] net: wireless: ti: Return flow can be simplified for wl1271_cmd_interrogate

2015-10-21 Thread kbuild test robot

Hi Punit,

[auto build test WARNING on net/master -- if it's inappropriate base, please 
suggest rules for selecting the more suitable base]

url:
https://github.com/0day-ci/linux/commits/Punit-Vara/Fix-warnings-reported-by-coccicheck/20151021-230937
config: x86_64-allyesconfig (attached as .config)
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64 

All warnings (new ones prefixed by >>):

   drivers/net/wireless//ti/wlcore/acx.c: In function 'wl1271_acx_mem_map':
>> drivers/net/wireless//ti/wlcore/acx.c:161:6: warning: unused variable 'ret' 
>> [-Wunused-variable]
 int ret;
 ^

vim +/ret +161 drivers/net/wireless//ti/wlcore/acx.c

f5fc0f86b drivers/net/wireless/wl12xx/wl1271_acx.c Luciano Coelho2009-08-06 
 145  
f5fc0f86b drivers/net/wireless/wl12xx/wl1271_acx.c Luciano Coelho2009-08-06 
 146ret = wl1271_cmd_configure(wl, ACX_FEATURE_CFG,
f5fc0f86b drivers/net/wireless/wl12xx/wl1271_acx.c Luciano Coelho2009-08-06 
 147   feature, sizeof(*feature));
f5fc0f86b drivers/net/wireless/wl12xx/wl1271_acx.c Luciano Coelho2009-08-06 
 148if (ret < 0) {
f5fc0f86b drivers/net/wireless/wl12xx/wl1271_acx.c Luciano Coelho2009-08-06 
 149wl1271_error("Couldnt set HW encryption");
f5fc0f86b drivers/net/wireless/wl12xx/wl1271_acx.c Luciano Coelho2009-08-06 
 150goto out;
f5fc0f86b drivers/net/wireless/wl12xx/wl1271_acx.c Luciano Coelho2009-08-06 
 151}
f5fc0f86b drivers/net/wireless/wl12xx/wl1271_acx.c Luciano Coelho2009-08-06 
 152  
f5fc0f86b drivers/net/wireless/wl12xx/wl1271_acx.c Luciano Coelho2009-08-06 
 153  out:
f5fc0f86b drivers/net/wireless/wl12xx/wl1271_acx.c Luciano Coelho2009-08-06 
 154kfree(feature);
f5fc0f86b drivers/net/wireless/wl12xx/wl1271_acx.c Luciano Coelho2009-08-06 
 155return ret;
f5fc0f86b drivers/net/wireless/wl12xx/wl1271_acx.c Luciano Coelho2009-08-06 
 156  }
f5fc0f86b drivers/net/wireless/wl12xx/wl1271_acx.c Luciano Coelho2009-08-06 
 157  
f5fc0f86b drivers/net/wireless/wl12xx/wl1271_acx.c Luciano Coelho2009-08-06 
 158  int wl1271_acx_mem_map(struct wl1271 *wl, struct acx_header *mem_map,
f5fc0f86b drivers/net/wireless/wl12xx/wl1271_acx.c Luciano Coelho2009-08-06 
 159   size_t len)
f5fc0f86b drivers/net/wireless/wl12xx/wl1271_acx.c Luciano Coelho2009-08-06 
 160  {
f5fc0f86b drivers/net/wireless/wl12xx/wl1271_acx.c Luciano Coelho2009-08-06 
@161int ret;
f5fc0f86b drivers/net/wireless/wl12xx/wl1271_acx.c Luciano Coelho2009-08-06 
 162  
f5fc0f86b drivers/net/wireless/wl12xx/wl1271_acx.c Luciano Coelho2009-08-06 
 163wl1271_debug(DEBUG_ACX, "acx mem map");
f5fc0f86b drivers/net/wireless/wl12xx/wl1271_acx.c Luciano Coelho2009-08-06 
 164  
1686f7fd1 drivers/net/wireless/ti/wlcore/acx.c Punit Vara2015-10-21 
 165return wl1271_cmd_interrogate(wl, ACX_MEM_MAP, mem_map,
4b6741443 drivers/net/wireless/ti/wlcore/acx.c Igal Chernobelsky 2013-09-09 
 166 sizeof(struct acx_header), len);
f5fc0f86b drivers/net/wireless/wl12xx/wl1271_acx.c Luciano Coelho2009-08-06 
 167  }
f5fc0f86b drivers/net/wireless/wl12xx/wl1271_acx.c Luciano Coelho2009-08-06 
 168  
8793f9bb1 drivers/net/wireless/wl12xx/wl1271_acx.c Juuso Oikarinen   2009-10-13 
 169  int wl1271_acx_rx_msdu_life_time(struct wl1271 *wl)

:: The code at line 161 was first introduced by commit
:: f5fc0f86b02afef1119b523623b4cde41475bc8c wl1271: add wl1271 driver files

:: TO: Luciano Coelho <luciano.coe...@nokia.com>
:: CC: John W. Linville <linvi...@tuxdriver.com>

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: Binary data

[RFC Patch 02/12] IXGBE: Add new mail box event to restore VF status in the PF driver

2015-10-21 Thread Lan Tianyu

This patch is to restore VF status in the PF driver when get event
from VF.

Signed-off-by: Lan Tianyu 
---
 drivers/net/ethernet/intel/ixgbe/ixgbe.h   |  1 +
 drivers/net/ethernet/intel/ixgbe/ixgbe_mbx.h   |  1 +
 drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c | 40 ++
 3 files changed, 42 insertions(+)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h 
b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
index 636f9e3..9d5669a 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
@@ -148,6 +148,7 @@ struct vf_data_storage {
bool pf_set_mac;
u16 pf_vlan; /* When set, guest VLAN config not allowed. */
u16 pf_qos;
+   u32 vf_lpe;
u16 tx_rate;
u16 vlan_count;
u8 spoofchk_enabled;
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_mbx.h 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_mbx.h
index b1e4703..8fdb38d 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_mbx.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_mbx.h
@@ -91,6 +91,7 @@ enum ixgbe_pfvf_api_rev {
 
 /* mailbox API, version 1.1 VF requests */
 #define IXGBE_VF_GET_QUEUES0x09 /* get queue configuration */
+#define IXGBE_VF_NOTIFY_RESUME0x0c /* VF notify PF migration finishing */
 
 /* GET_QUEUES return data indices within the mailbox */
 #define IXGBE_VF_TX_QUEUES 1   /* number of Tx queues supported */
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
index 1d17b58..ab2a2e2 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
@@ -648,6 +648,42 @@ static inline void ixgbe_write_qde(struct ixgbe_adapter 
*adapter, u32 vf,
}
 }
 
+/**
+ *  Restore the settings by mailbox, after migration
+ **/
+void ixgbe_restore_setting(struct ixgbe_adapter *adapter, u32 vf)
+{
+   struct ixgbe_hw *hw = >hw;
+   u32 reg, reg_offset, vf_shift;
+   int rar_entry = hw->mac.num_rar_entries - (vf + 1);
+
+   vf_shift = vf % 32;
+   reg_offset = vf / 32;
+
+   /* enable transmit and receive for vf */
+   reg = IXGBE_READ_REG(hw, IXGBE_VFTE(reg_offset));
+   reg |= (1 << vf_shift);
+   IXGBE_WRITE_REG(hw, IXGBE_VFTE(reg_offset), reg);
+
+   reg = IXGBE_READ_REG(hw, IXGBE_VFRE(reg_offset));
+   reg |= (1 << vf_shift);
+   IXGBE_WRITE_REG(hw, IXGBE_VFRE(reg_offset), reg);
+
+   reg = IXGBE_READ_REG(hw, IXGBE_VMECM(reg_offset));
+   reg |= (1 << vf_shift);
+   IXGBE_WRITE_REG(hw, IXGBE_VMECM(reg_offset), reg);
+
+   ixgbe_vf_reset_event(adapter, vf);
+
+   hw->mac.ops.set_rar(hw, rar_entry,
+   adapter->vfinfo[vf].vf_mac_addresses,
+   vf, IXGBE_RAH_AV);
+
+
+   if (adapter->vfinfo[vf].vf_lpe)
+   ixgbe_set_vf_lpe(adapter, >vfinfo[vf].vf_lpe, vf);
+}
+
 static int ixgbe_vf_reset_msg(struct ixgbe_adapter *adapter, u32 vf)
 {
struct ixgbe_ring_feature *vmdq = >ring_feature[RING_F_VMDQ];
@@ -1047,6 +1083,7 @@ static int ixgbe_rcv_msg_from_vf(struct ixgbe_adapter 
*adapter, u32 vf)
break;
case IXGBE_VF_SET_LPE:
retval = ixgbe_set_vf_lpe(adapter, msgbuf, vf);
+   adapter->vfinfo[vf].vf_lpe = *msgbuf;
break;
case IXGBE_VF_SET_MACVLAN:
retval = ixgbe_set_vf_macvlan_msg(adapter, msgbuf, vf);
@@ -1063,6 +1100,9 @@ static int ixgbe_rcv_msg_from_vf(struct ixgbe_adapter 
*adapter, u32 vf)
case IXGBE_VF_GET_RSS_KEY:
retval = ixgbe_get_vf_rss_key(adapter, msgbuf, vf);
break;
+   case IXGBE_VF_NOTIFY_RESUME:
+   ixgbe_restore_setting(adapter, vf);
+   break;
default:
e_err(drv, "Unhandled Msg %8.8x\n", msgbuf[0]);
retval = IXGBE_ERR_MBX;
-- 
1.8.4.rc0.1.g8f6a3e5.dirty

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC Patch 00/12] IXGBE: Add live migration support for SRIOV NIC

2015-10-21 Thread Lan Tianyu

This patchset is to propose a new solution to add live migration support for 
82599
SRIOV network card.

Im our solution, we prefer to put all device specific operation into VF and
PF driver and make code in the Qemu more general.


VF status migration
=
VF status can be divided into 4 parts
1) PCI configure regs
2) MSIX configure
3) VF status in the PF driver
4) VF MMIO regs 

The first three status are all handled by Qemu. 
The PCI configure space regs and MSIX configure are originally
stored in Qemu. To save and restore "VF status in the PF driver"
by Qemu during migration, adds new sysfs node "state_in_pf" under
VF sysfs directory.

For VF MMIO regs, we introduce self emulation layer in the VF
driver to record MMIO reg values during reading or writing MMIO
and put these data in the guest memory. It will be migrated with
guest memory to new machine.


VF function restoration

Restoring VF function operation are done in the VF and PF driver.
 
In order to let VF driver to know migration status, Qemu fakes VF
PCI configure regs to indicate migration status and add new sysfs
node "notify_vf" to trigger VF mailbox irq in order to notify VF 
about migration status change.

Transmit/Receive descriptor head regs are read-only and can't
be restored via writing back recording reg value directly and they
are set to 0 during VF reset. To reuse original tx/rx rings, shift
desc ring in order to move the desc pointed by original head reg to
first entry of the ring and then enable tx/rx rings. VF restarts to
receive and transmit from original head desc.


Tracking DMA accessed memory
=
Migration relies on tracking dirty page to migrate memory.
Hardware can't automatically mark a page as dirty after DMA
memory access. VF descriptor rings and data buffers are modified
by hardware when receive and transmit data. To track such dirty memory
manually, do dummy writes(read a byte and write it back) when receive
and transmit data.


Service down time test
=
So far, we tested migration between two laptops with 82599 nic which
are connected to a gigabit switch. Ping VF in the 0.001s interval
during migration on the host of source side. It service down
time is about 180ms.

[983769928.053604] 64 bytes from 10.239.48.100: icmp_seq=4131 ttl=64 time=2.79 
ms
[983769928.056422] 64 bytes from 10.239.48.100: icmp_seq=4132 ttl=64 time=2.79 
ms
[983769928.059241] 64 bytes from 10.239.48.100: icmp_seq=4133 ttl=64 time=2.79 
ms
[983769928.062071] 64 bytes from 10.239.48.100: icmp_seq=4134 ttl=64 time=2.80 
ms
[983769928.064890] 64 bytes from 10.239.48.100: icmp_seq=4135 ttl=64 time=2.79 
ms
[983769928.067716] 64 bytes from 10.239.48.100: icmp_seq=4136 ttl=64 time=2.79 
ms
[983769928.070538] 64 bytes from 10.239.48.100: icmp_seq=4137 ttl=64 time=2.79 
ms
[983769928.073360] 64 bytes from 10.239.48.100: icmp_seq=4138 ttl=64 time=2.79 
ms
[983769928.083444] no answer yet for icmp_seq=4139
[983769928.093524] no answer yet for icmp_seq=4140
[983769928.103602] no answer yet for icmp_seq=4141
[983769928.113684] no answer yet for icmp_seq=4142
[983769928.123763] no answer yet for icmp_seq=4143
[983769928.133854] no answer yet for icmp_seq=4144
[983769928.143931] no answer yet for icmp_seq=4145
[983769928.154008] no answer yet for icmp_seq=4146
[983769928.164084] no answer yet for icmp_seq=4147
[983769928.174160] no answer yet for icmp_seq=4148
[983769928.184236] no answer yet for icmp_seq=4149
[983769928.194313] no answer yet for icmp_seq=4150
[983769928.204390] no answer yet for icmp_seq=4151
[983769928.214468] no answer yet for icmp_seq=4152
[983769928.224556] no answer yet for icmp_seq=4153
[983769928.234632] no answer yet for icmp_seq=4154
[983769928.244709] no answer yet for icmp_seq=4155
[983769928.254783] no answer yet for icmp_seq=4156
[983769928.256094] 64 bytes from 10.239.48.100: icmp_seq=4139 ttl=64 time=182 ms
[983769928.256107] 64 bytes from 10.239.48.100: icmp_seq=4140 ttl=64 time=172 ms
[983769928.256114] no answer yet for icmp_seq=4157
[983769928.256236] 64 bytes from 10.239.48.100: icmp_seq=4141 ttl=64 time=162 ms
[983769928.256245] 64 bytes from 10.239.48.100: icmp_seq=4142 ttl=64 time=152 ms
[983769928.256272] 64 bytes from 10.239.48.100: icmp_seq=4143 ttl=64 time=142 ms
[983769928.256310] 64 bytes from 10.239.48.100: icmp_seq=4144 ttl=64 time=132 ms
[983769928.256325] 64 bytes from 10.239.48.100: icmp_seq=4145 ttl=64 time=122 ms
[983769928.256332] 64 bytes from 10.239.48.100: icmp_seq=4146 ttl=64 time=112 ms
[983769928.256440] 64 bytes from 10.239.48.100: icmp_seq=4147 ttl=64 time=102 ms
[983769928.256455] 64 bytes from 10.239.48.100: icmp_seq=4148 ttl=64 time=92.3 
ms
[983769928.256494] 64 bytes from 10.239.48.100: icmp_seq=4149 ttl=64 time=82.3 
ms
[983769928.256503] 64

[RFC Patch 08/12] IXGBEVF: Rework code of finding the end transmit desc of package

2015-10-21 Thread Lan Tianyu

When transmit a package, the end transmit desc of package
indicates whether package is sent already. Current code records
the end desc's pointer in the next_to_watch of struct tx buffer.
This code will be broken if shifting desc ring after migration.
The pointer will be invalid. This patch is to replace recording
pointer with recording the desc number of the package and find
the end decs via the first desc and desc number.

Signed-off-by: Lan Tianyu 
---
 drivers/net/ethernet/intel/ixgbevf/ixgbevf.h  |  1 +
 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 19 ---
 2 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h 
b/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h
index 775d089..c823616 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h
@@ -54,6 +54,7 @@
  */
 struct ixgbevf_tx_buffer {
union ixgbe_adv_tx_desc *next_to_watch;
+   u16 desc_num;
unsigned long time_stamp;
struct sk_buff *skb;
unsigned int bytecount;
diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c 
b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
index 4446916..056841c 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
@@ -210,6 +210,7 @@ static void ixgbevf_unmap_and_free_tx_resource(struct 
ixgbevf_ring *tx_ring,
   DMA_TO_DEVICE);
}
tx_buffer->next_to_watch = NULL;
+   tx_buffer->desc_num = 0;
tx_buffer->skb = NULL;
dma_unmap_len_set(tx_buffer, len, 0);
/* tx_buffer must be completely set up in the transmit path */
@@ -295,7 +296,7 @@ static bool ixgbevf_clean_tx_irq(struct ixgbevf_q_vector 
*q_vector,
union ixgbe_adv_tx_desc *tx_desc;
unsigned int total_bytes = 0, total_packets = 0;
unsigned int budget = tx_ring->count / 2;
-   unsigned int i = tx_ring->next_to_clean;
+   int i, watch_index;
 
if (test_bit(__IXGBEVF_DOWN, >state))
return true;
@@ -305,9 +306,17 @@ static bool ixgbevf_clean_tx_irq(struct ixgbevf_q_vector 
*q_vector,
i -= tx_ring->count;
 
do {
-   union ixgbe_adv_tx_desc *eop_desc = tx_buffer->next_to_watch;
+   union ixgbe_adv_tx_desc *eop_desc;
+
+   if (!tx_buffer->desc_num)
+   break;
+
+   if (i + tx_buffer->desc_num >= 0)
+   watch_index = i + tx_buffer->desc_num;
+   else
+   watch_index = i + tx_ring->count + tx_buffer->desc_num;
 
-   /* if next_to_watch is not set then there is no work pending */
+   eop_desc = IXGBEVF_TX_DESC(tx_ring, watch_index);
if (!eop_desc)
break;
 
@@ -320,6 +329,7 @@ static bool ixgbevf_clean_tx_irq(struct ixgbevf_q_vector 
*q_vector,
 
/* clear next_to_watch to prevent false hangs */
tx_buffer->next_to_watch = NULL;
+   tx_buffer->desc_num = 0;
 
/* update the statistics for this packet */
total_bytes += tx_buffer->bytecount;
@@ -3457,6 +3467,7 @@ static void ixgbevf_tx_map(struct ixgbevf_ring *tx_ring,
u32 tx_flags = first->tx_flags;
__le32 cmd_type;
u16 i = tx_ring->next_to_use;
+   u16 start;
 
tx_desc = IXGBEVF_TX_DESC(tx_ring, i);
 
@@ -3540,6 +3551,8 @@ static void ixgbevf_tx_map(struct ixgbevf_ring *tx_ring,
 
/* set next_to_watch value indicating a packet is present */
first->next_to_watch = tx_desc;
+   start = first - tx_ring->tx_buffer_info;
+   first->desc_num = (i - start >= 0) ? i - start: i + tx_ring->count - 
start;
 
i++;
if (i == tx_ring->count)
-- 
1.8.4.rc0.1.g8f6a3e5.dirty

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH nf-next 0/4] netfilter: rework netfilter ipv6 defrag

2015-10-21 Thread Joe Stringer

On 21 October 2015 at 07:50, Florian Westphal  wrote:
> Pablo Neira Ayuso  wrote:
>> > I can then wait for that change to pop up in nf-next and just resend
>> > this series (which will then undo that change).
>>
>> I'd rather get things fixes for the existing code. This would also
>> allow simple passing back to -stable, then we can move forward discuss
>> and review your rework with sufficient time.
>
> Joe, could you take care of this and submit a OVS fix to net tree?
>
> (just add that call to nf_ct_frag6_consume_orig and take
>  the morph change directly into the OVS callpath)
>
> I will then resubmit all of this at some later point.
>
> Thanks.

Sure thing.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next 5/6] arcnet: com20020-pci: add led trigger support

2015-10-21 Thread kbuild test robot

Hi Michael,

[auto build test ERROR on net-next/master -- if it's inappropriate base, please 
suggest rules for selecting the more suitable base]

url:
https://github.com/0day-ci/linux/commits/Michael-Grzeschik/arcnet-move-dev_free_skb-to-its-only-user/20151021-235034
config: i386-randconfig-x008-10211814 (attached as .config)
reproduce:
# save the attached .config to linux build tree
make ARCH=i386 

All errors (new ones prefixed by >>):

   drivers/net/arcnet/arcnet.c: In function 'arcnet_led_event':
>> drivers/net/arcnet/arcnet.c:203:3: error: implicit declaration of function 
>> 'led_trigger_blink_oneshot' [-Werror=implicit-function-declaration]
  led_trigger_blink_oneshot(lp->recon_led_trig,
  ^
   cc1: some warnings being treated as errors

vim +/led_trigger_blink_oneshot +203 drivers/net/arcnet/arcnet.c

   197  struct arcnet_local *lp = netdev_priv(dev);
   198  unsigned long led_delay = 350;
   199  unsigned long tx_delay = 50;
   200  
   201  switch (event) {
   202  case ARCNET_LED_EVENT_RECON:
 > 203  led_trigger_blink_oneshot(lp->recon_led_trig,
   204_delay, _delay, 0);
   205  break;
   206  case ARCNET_LED_EVENT_OPEN:

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: Binary data

[RFC Patch 05/12] IXGBE: Add new sysfs interface of "notify_vf"

2015-10-21 Thread Lan Tianyu

This patch is to add new sysfs interface of "notify_vf" under sysfs
directory of VF PCI device for Qemu to notify VF when migration status
is changed.

Signed-off-by: Lan Tianyu 
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c | 30 ++
 drivers/net/ethernet/intel/ixgbe/ixgbe_type.h  |  4 
 2 files changed, 34 insertions(+)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
index e247d67..5cc7817 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
@@ -217,10 +217,37 @@ static ssize_t ixgbe_store_state_in_pf(struct device *dev,
return count;
 }
 
+static ssize_t ixgbe_store_notify_vf(struct device *dev,
+  struct device_attribute *attr,
+  const char *buf, size_t count)
+{
+   struct ixgbe_adapter *adapter = to_adapter(dev);
+   struct ixgbe_hw *hw = >hw;
+   struct pci_dev *vf_pdev = to_pci_dev(dev);
+   int vfn = vf_pdev->virtfn_index;
+   u32 ivar;
+
+   /* Enable VF mailbox irq first */
+   IXGBE_WRITE_REG(hw, IXGBE_PVTEIMS(vfn), 0x4);
+   IXGBE_WRITE_REG(hw, IXGBE_PVTEIAM(vfn), 0x4);
+   IXGBE_WRITE_REG(hw, IXGBE_PVTEIAC(vfn), 0x4);
+
+   ivar = IXGBE_READ_REG(hw, IXGBE_PVTIVAR_MISC(vfn));
+   ivar &= ~0xFF;
+   ivar |= 0x2 | IXGBE_IVAR_ALLOC_VAL;
+   IXGBE_WRITE_REG(hw, IXGBE_PVTIVAR_MISC(vfn), ivar);
+
+   ixgbe_ping_vf(adapter, vfn);
+   return count;
+}
+
 static struct device_attribute ixgbe_per_state_in_pf_attribute =
__ATTR(state_in_pf, S_IRUGO | S_IWUSR,
ixgbe_show_state_in_pf, ixgbe_store_state_in_pf);
 
+static struct device_attribute ixgbe_per_notify_vf_attribute =
+   __ATTR(notify_vf, S_IWUSR, NULL, ixgbe_store_notify_vf);
+
 void ixgbe_add_vf_attrib(struct ixgbe_adapter *adapter)
 {
struct pci_dev *pdev = adapter->pdev;
@@ -241,6 +268,8 @@ void ixgbe_add_vf_attrib(struct ixgbe_adapter *adapter)
if (vfdev->is_virtfn) {
ret = device_create_file(>dev,
_per_state_in_pf_attribute);
+   ret |= device_create_file(>dev,
+   _per_notify_vf_attribute);
if (ret)
pr_warn("Unable to add VF attribute for dev 
%s,\n",
dev_name(>dev));
@@ -269,6 +298,7 @@ void ixgbe_remove_vf_attrib(struct ixgbe_adapter *adapter)
while (vfdev) {
if (vfdev->is_virtfn) {
device_remove_file(>dev, 
_per_state_in_pf_attribute);
+   device_remove_file(>dev, 
_per_notify_vf_attribute);
}
 
vfdev = pci_get_device(pdev->vendor, vf_id, vfdev);
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_type.h 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_type.h
index dd6ba59..c6ddb66 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_type.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_type.h
@@ -2302,6 +2302,10 @@ enum {
 #define IXGBE_PVFTDT(P)(0x06018 + (0x40 * (P)))
 #define IXGBE_PVFTDWBAL(P) (0x06038 + (0x40 * (P)))
 #define IXGBE_PVFTDWBAH(P) (0x0603C + (0x40 * (P)))
+#define IXGBE_PVTEIMS(P)   (0x00D00 + (4 * (P)))
+#define IXGBE_PVTIVAR_MISC(P)  (0x04E00 + (4 * (P)))
+#define IXGBE_PVTEIAC(P)   (0x00F00 + (4 * P))
+#define IXGBE_PVTEIAM(P)   (0x04D00 + (4 * P))
 
 #define IXGBE_PVFTDWBALn(q_per_pool, vf_number, vf_q_index) \
(IXGBE_PVFTDWBAL((q_per_pool)*(vf_number) + (vf_q_index)))
-- 
1.8.4.rc0.1.g8f6a3e5.dirty

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC Patch 04/12] IXGBE: Add ixgbe_ping_vf() to notify a specified VF via mailbox msg.

2015-10-21 Thread Lan Tianyu

This patch is to add ixgbe_ping_vf() to notify a specified VF. When
migration status is changed, it's necessary to notify VF the change.
VF driver will check the migrate status when it gets mailbox msg.

Signed-off-by: Lan Tianyu 
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c | 19 ---
 drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.h |  1 +
 2 files changed, 13 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
index 89671eb..e247d67 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
@@ -1318,18 +1318,23 @@ void ixgbe_disable_tx_rx(struct ixgbe_adapter *adapter)
IXGBE_WRITE_REG(hw, IXGBE_VFRE(1), 0);
 }
 
-void ixgbe_ping_all_vfs(struct ixgbe_adapter *adapter)
+void ixgbe_ping_vf(struct ixgbe_adapter *adapter, int vfn)
 {
struct ixgbe_hw *hw = >hw;
u32 ping;
+
+   ping = IXGBE_PF_CONTROL_MSG;
+   if (adapter->vfinfo[vfn].clear_to_send)
+   ping |= IXGBE_VT_MSGTYPE_CTS;
+   ixgbe_write_mbx(hw, , 1, vfn);
+}
+
+void ixgbe_ping_all_vfs(struct ixgbe_adapter *adapter)
+{
int i;
 
-   for (i = 0 ; i < adapter->num_vfs; i++) {
-   ping = IXGBE_PF_CONTROL_MSG;
-   if (adapter->vfinfo[i].clear_to_send)
-   ping |= IXGBE_VT_MSGTYPE_CTS;
-   ixgbe_write_mbx(hw, , 1, i);
-   }
+   for (i = 0 ; i < adapter->num_vfs; i++)
+   ixgbe_ping_vf(adapter, i);
 }
 
 int ixgbe_ndo_set_vf_mac(struct net_device *netdev, int vf, u8 *mac)
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.h 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.h
index 2c197e6..143e2fd 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.h
@@ -41,6 +41,7 @@ void ixgbe_msg_task(struct ixgbe_adapter *adapter);
 int ixgbe_vf_configuration(struct pci_dev *pdev, unsigned int event_mask);
 void ixgbe_disable_tx_rx(struct ixgbe_adapter *adapter);
 void ixgbe_ping_all_vfs(struct ixgbe_adapter *adapter);
+void ixgbe_ping_vf(struct ixgbe_adapter *adapter, int vfn);
 int ixgbe_ndo_set_vf_mac(struct net_device *netdev, int queue, u8 *mac);
 int ixgbe_ndo_set_vf_vlan(struct net_device *netdev, int queue, u16 vlan,
   u8 qos);
-- 
1.8.4.rc0.1.g8f6a3e5.dirty

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC Patch 07/12] IXGBEVF: Add new mail box event for migration

2015-10-21 Thread Lan Tianyu

VF status in the PF driver needs to be restored after migration and reset
VF hardware. This patch is to add a new event for VF driver to notify PF
driver to restore status.

Signed-off-by: Lan Tianyu 
---
 drivers/net/ethernet/intel/ixgbevf/mbx.h |  3 +++
 drivers/net/ethernet/intel/ixgbevf/vf.c  | 10 ++
 drivers/net/ethernet/intel/ixgbevf/vf.h  |  1 +
 3 files changed, 14 insertions(+)

diff --git a/drivers/net/ethernet/intel/ixgbevf/mbx.h 
b/drivers/net/ethernet/intel/ixgbevf/mbx.h
index 82f44e0..22761d8 100644
--- a/drivers/net/ethernet/intel/ixgbevf/mbx.h
+++ b/drivers/net/ethernet/intel/ixgbevf/mbx.h
@@ -112,6 +112,9 @@ enum ixgbe_pfvf_api_rev {
 #define IXGBE_VF_GET_RETA  0x0a/* VF request for RETA */
 #define IXGBE_VF_GET_RSS_KEY   0x0b/* get RSS hash key */
 
+/* mail box event for live migration  */
+#define IXGBE_VF_NOTIFY_RESUME  0x0c /* VF notify PF migration to restore 
status */
+
 /* length of permanent address message returned from PF */
 #define IXGBE_VF_PERMADDR_MSG_LEN  4
 /* word in permanent address message with the current multicast type */
diff --git a/drivers/net/ethernet/intel/ixgbevf/vf.c 
b/drivers/net/ethernet/intel/ixgbevf/vf.c
index d1339b0..1e4e5e6 100644
--- a/drivers/net/ethernet/intel/ixgbevf/vf.c
+++ b/drivers/net/ethernet/intel/ixgbevf/vf.c
@@ -717,6 +717,15 @@ int ixgbevf_get_queues(struct ixgbe_hw *hw, unsigned int 
*num_tcs,
return err;
 }
 
+static void ixgbevf_notify_resume_vf(struct ixgbe_hw *hw)
+{
+   struct ixgbe_mbx_info *mbx = >mbx;
+   u32 msgbuf[1];
+
+   msgbuf[0] = IXGBE_VF_NOTIFY_RESUME;
+   mbx->ops.write_posted(hw, msgbuf, 1);
+}
+
 static const struct ixgbe_mac_operations ixgbevf_mac_ops = {
.init_hw= ixgbevf_init_hw_vf,
.reset_hw   = ixgbevf_reset_hw_vf,
@@ -729,6 +738,7 @@ static const struct ixgbe_mac_operations ixgbevf_mac_ops = {
.update_mc_addr_list= ixgbevf_update_mc_addr_list_vf,
.set_uc_addr= ixgbevf_set_uc_addr_vf,
.set_vfta   = ixgbevf_set_vfta_vf,
+   .notify_resume  = ixgbevf_notify_resume_vf,
 };
 
 const struct ixgbevf_info ixgbevf_82599_vf_info = {
diff --git a/drivers/net/ethernet/intel/ixgbevf/vf.h 
b/drivers/net/ethernet/intel/ixgbevf/vf.h
index 6a3f4eb..a25fe81 100644
--- a/drivers/net/ethernet/intel/ixgbevf/vf.h
+++ b/drivers/net/ethernet/intel/ixgbevf/vf.h
@@ -70,6 +70,7 @@ struct ixgbe_mac_operations {
s32 (*disable_mc)(struct ixgbe_hw *);
s32 (*clear_vfta)(struct ixgbe_hw *);
s32 (*set_vfta)(struct ixgbe_hw *, u32, u32, bool);
+   void (*notify_resume)(struct ixgbe_hw *); 
 };
 
 enum ixgbe_mac_type {
-- 
1.8.4.rc0.1.g8f6a3e5.dirty

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC Patch 09/12] IXGBEVF: Add live migration support for VF driver

2015-10-21 Thread Lan Tianyu

To let VF driver in the guest to know migration status, Qemu will
fake PCI configure reg 0xF0 and 0xF1 to show migrate status and
get ack from VF driver.

When migration starts, Qemu will set reg "0xF0" to 1, notify
VF driver via triggering mail box msg and wait for VF driver to tell
it's ready for migration(set reg "0xF1" to 1). After migration, Qemu
will set reg "0xF0" to 0 and notify VF driver by mail box irq. VF
driver begins to restore tx/rx function after detecting sttatus change.

When VF receives mail box irq, it will check reg "0xF0" in the service
task function to get migration status and performs related operations
according its value.

Steps of restarting receive and transmit function
1) Restore VF status in the PF driver via sending mail event to PF driver
2) Write back reg values recorded by self emulation layer
3) Restart rx/tx ring
4) Recovery interrupt

Transmit/Receive descriptor head regs are read-only and can't
be restored via writing back recording reg value directly and they
are set to 0 during VF reset. To reuse original tx/rx rings, shift
desc ring in order to move the desc pointed by original head reg to
first entry of the ring and then enable tx/rx rings. VF restarts to
receive and transmit from original head desc.

Signed-off-by: Lan Tianyu 
---
 drivers/net/ethernet/intel/ixgbevf/defines.h   |   6 ++
 drivers/net/ethernet/intel/ixgbevf/ixgbevf.h   |   7 +-
 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c  | 115 -
 .../net/ethernet/intel/ixgbevf/self-emulation.c| 107 +++
 4 files changed, 232 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbevf/defines.h 
b/drivers/net/ethernet/intel/ixgbevf/defines.h
index 770e21a..113efd2 100644
--- a/drivers/net/ethernet/intel/ixgbevf/defines.h
+++ b/drivers/net/ethernet/intel/ixgbevf/defines.h
@@ -239,6 +239,12 @@ struct ixgbe_adv_tx_context_desc {
__le32 mss_l4len_idx;
 };
 
+union ixgbevf_desc {
+   union ixgbe_adv_tx_desc rx_desc;
+   union ixgbe_adv_rx_desc tx_desc;
+   struct ixgbe_adv_tx_context_desc tx_context_desc;
+};
+
 /* Adv Transmit Descriptor Config Masks */
 #define IXGBE_ADVTXD_DTYP_MASK 0x00F0 /* DTYP mask */
 #define IXGBE_ADVTXD_DTYP_CTXT 0x0020 /* Advanced Context Desc */
diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h 
b/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h
index c823616..6eab402e 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h
@@ -109,7 +109,7 @@ struct ixgbevf_ring {
struct ixgbevf_ring *next;
struct net_device *netdev;
struct device *dev;
-   void *desc; /* descriptor ring memory */
+   union ixgbevf_desc *desc;   /* descriptor ring memory */
dma_addr_t dma; /* phys. address of descriptor ring */
unsigned int size;  /* length in bytes */
u16 count;  /* amount of descriptors */
@@ -493,6 +493,11 @@ extern void ixgbevf_write_eitr(struct ixgbevf_q_vector 
*q_vector);
 
 void ixgbe_napi_add_all(struct ixgbevf_adapter *adapter);
 void ixgbe_napi_del_all(struct ixgbevf_adapter *adapter);
+int ixgbevf_tx_ring_shift(struct ixgbevf_ring *r, u32 head);
+int ixgbevf_rx_ring_shift(struct ixgbevf_ring *r, u32 head);
+void ixgbevf_restore_state(struct ixgbevf_adapter *adapter);
+inline void ixgbevf_irq_enable(struct ixgbevf_adapter *adapter);
+
 
 #ifdef DEBUG
 char *ixgbevf_get_hw_dev_name(struct ixgbe_hw *hw);
diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c 
b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
index 056841c..15ec361 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
@@ -91,6 +91,10 @@ MODULE_DESCRIPTION("Intel(R) 10 Gigabit Virtual Function 
Network Driver");
 MODULE_LICENSE("GPL");
 MODULE_VERSION(DRV_VERSION);
 
+
+#define MIGRATION_COMPLETED   0x00
+#define MIGRATION_IN_PROGRESS 0x01
+
 #define DEFAULT_MSG_ENABLE (NETIF_MSG_DRV|NETIF_MSG_PROBE|NETIF_MSG_LINK)
 static int debug = -1;
 module_param(debug, int, 0);
@@ -221,6 +225,78 @@ static u64 ixgbevf_get_tx_completed(struct ixgbevf_ring 
*ring)
return ring->stats.packets;
 }
 
+int ixgbevf_tx_ring_shift(struct ixgbevf_ring *r, u32 head)
+{
+   struct ixgbevf_tx_buffer *tx_buffer = NULL;
+   static union ixgbevf_desc *tx_desc = NULL;
+
+   tx_buffer = vmalloc(sizeof(struct ixgbevf_tx_buffer) * (r->count));
+   if (!tx_buffer)
+   return -ENOMEM;
+
+   tx_desc = vmalloc(sizeof(union ixgbevf_desc) * r->count);
+   if (!tx_desc)
+   return -ENOMEM;
+
+   memcpy(tx_desc, r->desc, sizeof(union ixgbevf_desc) * r->count);
+   memcpy(r->desc, _desc[head], sizeof(union ixgbevf_desc) * (r->count 
- head));
+   memcpy(>desc[r->count - head], tx_desc, sizeof(union ixgbevf_desc) * 
head);
+
+

Re: [PATCH -next] net: hisilicon: Never build on SPARC

2015-10-21 Thread Guenter Roeck

On 10/21/2015 08:57 AM, Arnd Bergmann wrote:

On Wednesday 21 October 2015 08:33:11 David Miller wrote:

From: Guenter Roeck 
Date: Wed, 21 Oct 2015 07:56:18 -0700

@@ -57,6 +57,11 @@ extern int of_dma_get_range(struct device_node *np,
u64 *dma_addr,
  u64 *paddr, u64 *size);
   extern bool of_dma_is_coherent(struct device_node *np);
   #else /* CONFIG_OF_ADDRESS */
+static inline u64 of_translate_address(struct device_node *np, const
__be32 *addr)
+{
+return 0;

Maybe return OF_BAD_ADDR ?

The thing to really do on sparc, is just return the address raw untranslated
because that just works.

We still need to check #address-cells, right?

Something like this?

static inline u64 of_translate_address(struct device_node *np, const __be32 
*addr)
{
#if defined(CONFIG_SPARC) || defined(CONFIG_M68K)
int pna = of_n_addr_cells(np);
u64 ret = be32_to_cpu(addr[pna - 1]);

if (pna > 1)
ret += (u64)be32_to_cpu(addr[pna - 2]) << 32;

return ret;

That suggests that sparc would need a translation after all, which
seems to contradict what David said earlier.

Anyway, if it gets that complicated, I think we should stick with
just returning OF_BAD_ADDR. The above really suggests the need for
an architecture specific solution.

Guenter

#else
return OF_BAD_ADDR;
#endif
}

Arnd

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] net: dsa: mv88e6060: Fix false positive lockdep splat

2015-10-21 Thread Neil Armstrong

Like the change made for mv88e6xxx, use mutex_lock_nested() to avoid
lockdep to give false positives because of nested MDIO busses.

The false positive was observed using a mv88e6060 from a TI816X SoC.

Signed-off-by: Neil Armstrong 
---
 drivers/net/dsa/mv88e6060.c | 19 +--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/drivers/net/dsa/mv88e6060.c b/drivers/net/dsa/mv88e6060.c
index c29aebe..b1db460 100644
--- a/drivers/net/dsa/mv88e6060.c
+++ b/drivers/net/dsa/mv88e6060.c
@@ -19,14 +19,24 @@
 #define REG_PORT(p)(8 + (p))
 #define REG_GLOBAL 0x0f

+/* MDIO bus access can be nested in the case of PHYs connected to the
+ * internal MDIO bus of the switch, which is accessed via MDIO bus of
+ * the Ethernet interface. Avoid lockdep false positives by using
+ * mutex_lock_nested().
+ */
 static int reg_read(struct dsa_switch *ds, int addr, int reg)
 {
+   int ret;
struct mii_bus *bus = dsa_host_dev_to_mii_bus(ds->master_dev);

if (bus == NULL)
return -EINVAL;

-   return mdiobus_read(bus, ds->pd->sw_addr + addr, reg);
+   mutex_lock_nested(>mdio_lock, SINGLE_DEPTH_NESTING);
+   ret = bus->read(bus, ds->pd->sw_addr, reg);
+   mutex_unlock(>mdio_lock);
+
+   return ret;
 }

 #define REG_READ(addr, reg)\
@@ -42,12 +52,17 @@ static int reg_read(struct dsa_switch *ds, int addr, int 
reg)

 static int reg_write(struct dsa_switch *ds, int addr, int reg, u16 val)
 {
+   int ret;
struct mii_bus *bus = dsa_host_dev_to_mii_bus(ds->master_dev);

if (bus == NULL)
return -EINVAL;

-   return mdiobus_write(bus, ds->pd->sw_addr + addr, reg, val);
+   mutex_lock_nested(>mdio_lock, SINGLE_DEPTH_NESTING);
+   ret = bus->write(bus, ds->pd->sw_addr, reg, val);
+   mutex_unlock(>mdio_lock);
+
+   return ret;
 }

 #define REG_WRITE(addr, reg, val)  \
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Bug 106241] New: shutdown(3)/close(3) behaviour is incorrect for sockets in accept(3)

2015-10-21 Thread Casper . Dik

From: David Miller 
Date: Wed, 21 Oct 2015 08:30:08 -0700 (PDT) (17:30 CEST)

>From: Alan Burlison 
>Date: Wed, 21 Oct 2015 15:38:51 +0100
>
>> While this algorithm is pretty expensive, it is not often invoked.
>
>I bet it can be easily intentionally invoked, by a malicious entity no
>less.

It is only expensive within the process itself.  Whether it is run inside 
the kernel isn't much different in the context of Solaris.  If you have an 
attacker which can run any code, it doesn't really matter what that code 
is.  It is not really, expensive (like grabbing expensive locks or
for any length of time).  It's basically O(n) depending on the numbers of 
threads in the process.

If you have an application which can be triggered in doing that, it is 
still a bug in the application.  

Is such socket still listed with netstat on Linux?  I believe it uses
uses /proc and it will not be able to find that socket through the list of 
opened files.

If we look at our typical problem we have a accept loop:

for (;;) {
newfd = accept(fd. ...);  /* X */

/* stuff */
}

While we have a second thread doing a "close(fd);" and possibly opening 
another file which just happens to return this particular fd.

In Solaris the following one of the following things will happen,
whatever the first thread is doing once close() is called:

- accept() dies with EBADF (close() before or during the call to
  accept())
- accept() returns some other error (new fd you can't accept on)
- accept() returns a new fd (if it was closed and reopened and a 
  the new fd allows accept())

On Linux exactly the same thing happens *except* when we find ourselves in 
accept(),
then we wait until a connection made or "shutdown()" is called.

I don't think any of the outcomes in the first thread is acceptable; 
clearly no sufficient synchronization between the threads.

At that point Linux cannot find out who owns the socket:

#  netstat -p -a | grep /tmp/unix
unix  2  [ ACC ] STREAM LISTENING 14743  -   
/tmp/unix_sock

In Solaris you'd get:

netstat -u -f unix| grep unix_
stream-ord casper 5334 shutdown   /tmp/unix_sock

Simple synchronization is can be done.

Casper

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next 4/6] arcnet: com20020-pci: add rotary index support

2015-10-21 Thread Michael Grzeschik

The EAE PLX-PCI card has a special rotary encoder
to configure the address of every card individually.
We take this information for the initial setup of
the cards dev_id.

Signed-off-by: Michael Grzeschik 
---
 drivers/net/arcnet/com20020-pci.c | 33 +
 drivers/net/arcnet/com20020.h |  4 
 2 files changed, 37 insertions(+)

diff --git a/drivers/net/arcnet/com20020-pci.c 
b/drivers/net/arcnet/com20020-pci.c
index e3b7c14e..637a611 100644
--- a/drivers/net/arcnet/com20020-pci.c
+++ b/drivers/net/arcnet/com20020-pci.c
@@ -68,6 +68,7 @@ static int com20020pci_probe(struct pci_dev *pdev,
 const struct pci_device_id *id)
 {
struct com20020_pci_card_info *ci;
+   struct com20020_pci_channel_map *mm;
struct net_device *dev;
struct arcnet_local *lp;
struct com20020_priv *priv;
@@ -84,9 +85,22 @@ static int com20020pci_probe(struct pci_dev *pdev,
 
ci = (struct com20020_pci_card_info *)id->driver_data;
priv->ci = ci;
+   mm = >misc_map;
 
INIT_LIST_HEAD(>list_dev);
 
+   if (mm->size) {
+   ioaddr = pci_resource_start(pdev, mm->bar) + mm->offset;
+   r = devm_request_region(>dev, ioaddr, mm->size,
+   "com20020-pci");
+   if (!r) {
+   pr_err("IO region %xh-%xh already allocated.\n",
+  ioaddr, ioaddr + mm->size - 1);
+   return -EBUSY;
+   }
+   priv->misc = ioaddr;
+   }
+
for (i = 0; i < ci->devcount; i++) {
struct com20020_pci_channel_map *cm = >chan_map_tbl[i];
struct com20020_dev *card;
@@ -132,6 +146,13 @@ static int com20020pci_probe(struct pci_dev *pdev,
lp->timeout = timeout;
lp->hw.owner = THIS_MODULE;
 
+   /* Get the dev_id from the PLX rotary coder */
+   if (!strncmp(ci->name, "EAE PLX-PCI MA1", 15))
+   dev->dev_id = 0xc;
+   dev->dev_id ^= inb(priv->misc + ci->rotary) >> 4;
+
+   snprintf(dev->name, sizeof(dev->name), "arc%d-%d", dev->dev_id, 
i);
+
if (arcnet_inb(ioaddr, COM20020_REG_R_STATUS) == 0xFF) {
pr_err("IO address %Xh is empty!\n", ioaddr);
ret = -EIO;
@@ -235,6 +256,12 @@ static struct com20020_pci_card_info card_info_eae_arc1 = {
.size = 0x08,
},
},
+   .misc_map = {
+   .bar = 2,
+   .offset = 0x10,
+   .size = 0x04,
+   },
+   .rotary = 0x0,
.flags = ARC_CAN_10MBIT,
 };
 
@@ -252,6 +279,12 @@ static struct com20020_pci_card_info card_info_eae_ma1 = {
.size = 0x08,
}
},
+   .misc_map = {
+   .bar = 2,
+   .offset = 0x10,
+   .size = 0x04,
+   },
+   .rotary = 0x0,
.flags = ARC_CAN_10MBIT,
 };
 
diff --git a/drivers/net/arcnet/com20020.h b/drivers/net/arcnet/com20020.h
index 22a460f..4363b65 100644
--- a/drivers/net/arcnet/com20020.h
+++ b/drivers/net/arcnet/com20020.h
@@ -47,6 +47,9 @@ struct com20020_pci_card_info {
int devcount;
 
struct com20020_pci_channel_map chan_map_tbl[PLX_PCI_MAX_CARDS];
+   struct com20020_pci_channel_map misc_map;
+
+   int rotary;
 
unsigned int flags;
 };
@@ -54,6 +57,7 @@ struct com20020_pci_card_info {
 struct com20020_priv {
struct com20020_pci_card_info *ci;
struct list_head list_dev;
+   int __iomem misc;
 };
 
 struct com20020_dev {
-- 
2.6.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next 5/6] arcnet: com20020-pci: add led trigger support

2015-10-21 Thread Michael Grzeschik

The EAE PLX-PCI card has special leds on the the main io pci resource
bar. This patch adds support to trigger the conflict and data leds with
the packages.

Signed-off-by: Michael Grzeschik 
---
 drivers/net/arcnet/arcdevice.h| 19 ++
 drivers/net/arcnet/arcnet.c   | 72 ++
 drivers/net/arcnet/com20020-pci.c | 73 +++
 drivers/net/arcnet/com20020.h | 10 ++
 4 files changed, 174 insertions(+)

diff --git a/drivers/net/arcnet/arcdevice.h b/drivers/net/arcnet/arcdevice.h
index d7fdea1..2edc0c0 100644
--- a/drivers/net/arcnet/arcdevice.h
+++ b/drivers/net/arcnet/arcdevice.h
@@ -237,6 +237,8 @@ struct Outgoing {
numsegs;/* number of segments */
 };
 
+#define ARCNET_LED_NAME_SZ (IFNAMSIZ + 6)
+
 struct arcnet_local {
uint8_t config, /* current value of CONFIG register */
timeout,/* Extended timeout for COM20020 */
@@ -260,6 +262,11 @@ struct arcnet_local {
/* On preemtive and SMB a lock is needed */
spinlock_t lock;
 
+   struct led_trigger *tx_led_trig;
+   char tx_led_trig_name[ARCNET_LED_NAME_SZ];
+   struct led_trigger *recon_led_trig;
+   char recon_led_trig_name[ARCNET_LED_NAME_SZ];
+
/*
 * Buffer management: an ARCnet card has 4 x 512-byte buffers, each of
 * which can be used for either sending or receiving.  The new dynamic
@@ -309,6 +316,8 @@ struct arcnet_local {
int (*reset)(struct net_device *dev, int really_reset);
void (*open)(struct net_device *dev);
void (*close)(struct net_device *dev);
+   void (*datatrigger) (struct net_device * dev, int enable);
+   void (*recontrigger) (struct net_device * dev, int enable);
 
void (*copy_to_card)(struct net_device *dev, int bufnum,
 int offset, void *buf, int count);
@@ -319,6 +328,16 @@ struct arcnet_local {
void __iomem *mem_start;/* pointer to ioremap'ed MMIO */
 };
 
+enum arcnet_led_event {
+   ARCNET_LED_EVENT_RECON,
+   ARCNET_LED_EVENT_OPEN,
+   ARCNET_LED_EVENT_STOP,
+   ARCNET_LED_EVENT_TX,
+};
+
+void arcnet_led_event(struct net_device *netdev, enum arcnet_led_event event);
+void devm_arcnet_led_init(struct net_device *netdev, int index, int subid);
+
 #if ARCNET_DEBUG_MAX & D_SKB
 void arcnet_dump_skb(struct net_device *dev, struct sk_buff *skb, char *desc);
 #else
diff --git a/drivers/net/arcnet/arcnet.c b/drivers/net/arcnet/arcnet.c
index 542e2b4..4242522 100644
--- a/drivers/net/arcnet/arcnet.c
+++ b/drivers/net/arcnet/arcnet.c
@@ -52,6 +52,8 @@
 #include 
 #include 
 
+#include 
+
 #include "arcdevice.h"
 #include "com9026.h"
 
@@ -189,6 +191,71 @@ static void arcnet_dump_packet(struct net_device *dev, int 
bufnum,
 
 #endif
 
+/* Trigger a LED event in response to a ARCNET device event */
+void arcnet_led_event(struct net_device *dev, enum arcnet_led_event event)
+{
+   struct arcnet_local *lp = netdev_priv(dev);
+   unsigned long led_delay = 350;
+   unsigned long tx_delay = 50;
+
+   switch (event) {
+   case ARCNET_LED_EVENT_RECON:
+   led_trigger_blink_oneshot(lp->recon_led_trig,
+ _delay, _delay, 0);
+   break;
+   case ARCNET_LED_EVENT_OPEN:
+   led_trigger_event(lp->tx_led_trig, LED_OFF);
+   led_trigger_event(lp->recon_led_trig, LED_OFF);
+   break;
+   case ARCNET_LED_EVENT_STOP:
+   led_trigger_event(lp->tx_led_trig, LED_OFF);
+   led_trigger_event(lp->recon_led_trig, LED_OFF);
+   break;
+   case ARCNET_LED_EVENT_TX:
+   led_trigger_blink_oneshot(lp->tx_led_trig,
+ _delay, _delay, 0);
+   break;
+   }
+}
+EXPORT_SYMBOL_GPL(arcnet_led_event);
+
+static void arcnet_led_release(struct device *gendev, void *res)
+{
+   struct arcnet_local *lp = netdev_priv(to_net_dev(gendev));
+
+   led_trigger_unregister_simple(lp->tx_led_trig);
+   led_trigger_unregister_simple(lp->recon_led_trig);
+}
+
+/* Register ARCNET LED triggers for a arcnet device
+ *
+ * This is normally called from a driver's probe function
+ */
+void devm_arcnet_led_init(struct net_device *netdev, int index, int subid)
+{
+   struct arcnet_local *lp = netdev_priv(netdev);
+   void *res;
+
+   res = devres_alloc(arcnet_led_release, 0, GFP_KERNEL);
+   if (!res) {
+   netdev_err(netdev, "cannot register LED triggers\n");
+   return;
+   }
+
+   snprintf(lp->tx_led_trig_name, sizeof(lp->tx_led_trig_name),
+"arc%d-%d-tx", index, subid);
+   snprintf(lp->recon_led_trig_name, sizeof(lp->recon_led_trig_name),
+"arc%d-%d-recon", index, subid);
+
+

Fw: [Bug 106361] New: Kernel bug: Thread aborts unexpectedly when sending UDP message

2015-10-21 Thread Stephen Hemminger



Begin forwarded message:

Date: Tue, 20 Oct 2015 15:20:24 +
From: "bugzilla-dae...@bugzilla.kernel.org" 

To: "shemmin...@linux-foundation.org" 
Subject: [Bug 106361] New: Kernel bug: Thread aborts unexpectedly when sending 
UDP message


https://bugzilla.kernel.org/show_bug.cgi?id=106361

Bug ID: 106361
   Summary: Kernel bug: Thread aborts unexpectedly when sending
UDP message
   Product: Networking
   Version: 2.5
Kernel Version: 3.3.0
  Hardware: ARM
OS: Linux
  Tree: PREEMPT_RT
Status: NEW
  Severity: high
  Priority: P1
 Component: IPV4
  Assignee: shemmin...@linux-foundation.org
  Reporter: luckythree...@163.com
Regression: No

I run into a kernel bug at ARM Linux, during UDP usage with our application
(ENB03_BPU_APP_1) with Multithreading.
In my application, One Thread aborts unexpectedly when sending UDP message.
And I am repeatedly getting this, and it will happen within 3-12 hours of
running on a system doing primarily only UDP send/receive activity.


1> And Environment :
   1.1> root@tci6614-evm:~# uname -a
Linux tci6614-evm 3.3.0-dirty #20 PREEMPT Mon May 19 14:50:04 CST 2014
armv7l unknown

   1.2> root@tci6614-evm:~# cat /proc/version 
Linux version 3.3.0-dirty (root@cpit-desktop) (gcc version 4.3.3 (Sourcery
G++ Lite 2009q1-203) ) #20 PREEMPT Mon May 19 14:50:04 CST 2014



2>  This is the stack trace:
.
skb_under_panic: text:c0269e24 len:392 put:14 head:d3cea000 data:d3ce9ff4
tail:0xd3cea17c end:0xd3cea340 dev:eth0
[ cut here ]
kernel BUG at net/core/skbuff.c:147!
Internal error: Oops - BUG: 0 [#1] PREEMPT
Modules linked in: drv_fpga_module(O)
CPU: 0Tainted: G   O  (3.3.0-dirty #20)
PC is at skb_push+0x7c/0x84
LR is at skb_push+0x7c/0x84
pc : []lr : []psr: 6013
sp : d4afbc28  ip : c04c7770  fp : d62f2a54
r10:   r9 : d62f2a50  r8 : d6008800
r7 : d3cea000  r6 : d3ce9ff4  r5 : d3cea340  r4 : d6008800
r3 : 2093  r2 : 6013  r1 : 6013  r0 : 0078
Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
Control: 10c5387d  Table: 94960019  DAC: 0015
Process ENB03_BPU_APP_1 (pid: 1117, stack limit = 0xd4afa2e8)
Stack: (0xd4afbc28 to 0xd4afc000)
bc20:   d3cea000 d3ce9ff4 d3cea17c d3cea340 d6008800 d3cea002
bc40: d604c908 0800 d6325a80  d62f2a54 c0269e24 d62f2a00 d6325a80
bc60: 156c 017a d6008800 c0260160  017a c02884bc 016c
bc80: d6325a80 d62f2a00 d62be900 d6325a80  d4901e40 000e c028b2c0
bca0: d6008800 d6325a80 d6325a80 d4afbd8c d62be900  d6325a80 0158
bcc0: d4afbd8c c028a678 d3cea024 c028a684 d3cea024 c02aaef4  0158
bce0: d4afbf5c d62be900 0150  d6325a80  d4afbd8c c02ad378
bd00: 0158 0008 d4afbdac d4afbdbc  c01e07d0 0001 d4afbdac
bd20:    3232a8c0 3232a8c0 d2e6a8c0 3930 
bd40: c0289af0 d62beaa8      d4afbdc0
bd60: 0001 d5c9e108 d4afbf74 0001 0001 c018c0ac  d63e6a40
bd80:  d63e6a40 0001    0411 
bda0: d2e6a8c0 3232a8c0 44e03930 3232a8c0    
bdc0: 0003 d62be900 d4afbe00 d4afbf5c 0150 d4afbe00 d4afa000 0010
bde0: 7b26cc44 c02b40a0   d64f1480 0150 d4afbe00 c02416a4
be00:  fffe  0001    
be20:   d78a7500 c0053244   d78a7500 d78a7500
be40: d4afbe80 d5d5c000 d5d5c000 d5d5c000 d4afbeec c035f314 d5d6bf38 d4afbf30
be60: c04c9b94  c04dd658 c04c5144 c04be38c c036a8fc c04be0ec c04ca508
be80: d4afbf30 c0049d34  0150 d64f1480 c004a2d8  d4afbf5c
bea0: d4afbf80  008364f4 d64f1480 d4afbedc  008364f4 d4afbedc
bec0: 0150 d64f1480 d4afbedc  008364f4 c0241dec 004c4b40 3932
bee0: 3232a8c0   c035eac8  0001 004c4b40 
bf00:     0001 c004a47c  
bf20: c04f1fb8  004c4b40  d4afbf31   
bf40: 9a4359e0 07db 9a4359e0 07db c00499ac c04c9b88  d4afbedc
bf60: 0010 d4afbf78 0001    9a252ae8 0150
bf80: 0001  7b26dc8c 008364f4 0010 00397a08 0122 c0013bc8
bfa0:  c0013a20 008364f4 0010 000a 9a252ae8 0150 
bfc0: 008364f4 0010 00397a08 0122 003d0f00 beb76bd8  7b26cc44
bfe0:  7b26cc10 00399f58 0039acf4 8010 000a  
[] (skb_push+0x7c/0x84) from [] (eth_header+0x1c/0xb8)

Re: [PATCH net] net: try harder to not reuse ifindex when moving interfaces

2015-10-21 Thread David Miller

From: Jiri Benc 
Date: Wed, 21 Oct 2015 17:25:02 +0200

> On Wed, 21 Oct 2015 08:32:14 -0700 (PDT), David Miller wrote:
>> As you say the apps are broken, so file a bug and have them fixed.
>> 
>> The assumption is clearly invalid, so apps cannot make such an
>> assumption.
> 
> Does it mean you would be okay with a patch that always allocates and
> assigns a new ifindex in the target netns when interface is moved
> between name spaces?

I think you're misunderstanding me if you're still recommending
kernel changes.

I'm plainly saying to remove the assumption in the apps.

If you don't show me exactly how some kernel change can lead to
the apps implementing things properly, without the invalid
assumptions, then I can only assume you didn't hear what I
said.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Fw: [Bug 106361] New: Kernel bug: Thread aborts unexpectedly when sending UDP message

2015-10-21 Thread Eric Dumazet

On Wed, 2015-10-21 at 09:03 -0700, Stephen Hemminger wrote:
> 
> Begin forwarded message:
> 
> Date: Tue, 20 Oct 2015 15:20:24 +
> From: "bugzilla-dae...@bugzilla.kernel.org" 
> 
> To: "shemmin...@linux-foundation.org" 
> Subject: [Bug 106361] New: Kernel bug: Thread aborts unexpectedly when 
> sending UDP message
> 
> 
> https://bugzilla.kernel.org/show_bug.cgi?id=106361
> 
> Bug ID: 106361
>Summary: Kernel bug: Thread aborts unexpectedly when sending
> UDP message
>Product: Networking
>Version: 2.5
> Kernel Version: 3.3.0
>   Hardware: ARM
> OS: Linux
>   Tree: PREEMPT_RT
> Status: NEW
>   Severity: high
>   Priority: P1
>  Component: IPV4
>   Assignee: shemmin...@linux-foundation.org
>   Reporter: luckythree...@163.com
> Regression: No
> 
> I run into a kernel bug at ARM Linux, during UDP usage with our application
> (ENB03_BPU_APP_1) with Multithreading.
> In my application, One Thread aborts unexpectedly when sending UDP message.
> And I am repeatedly getting this, and it will happen within 3-12 hours of
> running on a system doing primarily only UDP send/receive activity.
> 
> 
> 1> And Environment :
>1.1> root@tci6614-evm:~# uname -a
> Linux tci6614-evm 3.3.0-dirty #20 PREEMPT Mon May 19 14:50:04 CST 2014
> armv7l unknown
> 
>1.2> root@tci6614-evm:~# cat /proc/version 
> Linux version 3.3.0-dirty (root@cpit-desktop) (gcc version 4.3.3 (Sourcery
> G++ Lite 2009q1-203) ) #20 PREEMPT Mon May 19 14:50:04 CST 2014
> 
> 
> 
> 2>  This is the stack trace:
> .
> skb_under_panic: text:c0269e24 len:392 put:14 head:d3cea000 data:d3ce9ff4
> tail:0xd3cea17c end:0xd3cea340 dev:eth0
> [ cut here ]
> kernel BUG at net/core/skbuff.c:147!
> Internal error: Oops - BUG: 0 [#1] PREEMPT
> Modules linked in: drv_fpga_module(O)
> CPU: 0Tainted: G   O  (3.3.0-dirty #20)
> PC is at skb_push+0x7c/0x84
> LR is at skb_push+0x7c/0x84
> pc : []lr : []psr: 6013
> sp : d4afbc28  ip : c04c7770  fp : d62f2a54
> r10:   r9 : d62f2a50  r8 : d6008800
> r7 : d3cea000  r6 : d3ce9ff4  r5 : d3cea340  r4 : d6008800
> r3 : 2093  r2 : 6013  r1 : 6013  r0 : 0078
> Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
> Control: 10c5387d  Table: 94960019  DAC: 0015
> Process ENB03_BPU_APP_1 (pid: 1117, stack limit = 0xd4afa2e8)
> Stack: (0xd4afbc28 to 0xd4afc000)
> bc20:   d3cea000 d3ce9ff4 d3cea17c d3cea340 d6008800 d3cea002
> bc40: d604c908 0800 d6325a80  d62f2a54 c0269e24 d62f2a00 d6325a80
> bc60: 156c 017a d6008800 c0260160  017a c02884bc 016c
> bc80: d6325a80 d62f2a00 d62be900 d6325a80  d4901e40 000e c028b2c0
> bca0: d6008800 d6325a80 d6325a80 d4afbd8c d62be900  d6325a80 0158
> bcc0: d4afbd8c c028a678 d3cea024 c028a684 d3cea024 c02aaef4  0158
> bce0: d4afbf5c d62be900 0150  d6325a80  d4afbd8c c02ad378
> bd00: 0158 0008 d4afbdac d4afbdbc  c01e07d0 0001 d4afbdac
> bd20:    3232a8c0 3232a8c0 d2e6a8c0 3930 
> bd40: c0289af0 d62beaa8      d4afbdc0
> bd60: 0001 d5c9e108 d4afbf74 0001 0001 c018c0ac  d63e6a40
> bd80:  d63e6a40 0001    0411 
> bda0: d2e6a8c0 3232a8c0 44e03930 3232a8c0    
> bdc0: 0003 d62be900 d4afbe00 d4afbf5c 0150 d4afbe00 d4afa000 0010
> bde0: 7b26cc44 c02b40a0   d64f1480 0150 d4afbe00 c02416a4
> be00:  fffe  0001    
> be20:   d78a7500 c0053244   d78a7500 d78a7500
> be40: d4afbe80 d5d5c000 d5d5c000 d5d5c000 d4afbeec c035f314 d5d6bf38 d4afbf30
> be60: c04c9b94  c04dd658 c04c5144 c04be38c c036a8fc c04be0ec c04ca508
> be80: d4afbf30 c0049d34  0150 d64f1480 c004a2d8  d4afbf5c
> bea0: d4afbf80  008364f4 d64f1480 d4afbedc  008364f4 d4afbedc
> bec0: 0150 d64f1480 d4afbedc  008364f4 c0241dec 004c4b40 3932
> bee0: 3232a8c0   c035eac8  0001 004c4b40 
> bf00:     0001 c004a47c  
> bf20: c04f1fb8  004c4b40  d4afbf31   
> bf40: 9a4359e0 07db 9a4359e0 07db c00499ac c04c9b88  d4afbedc
> bf60: 0010 d4afbf78 0001    9a252ae8 0150
> bf80: 0001  7b26dc8c 008364f4 0010 00397a08 0122 c0013bc8
> bfa0:  c0013a20 008364f4

[RFC Patch 06/12] IXGBEVF: Add self emulation layer

2015-10-21 Thread Lan Tianyu

In order to restore VF function after migration, add self emulation layer
to record regs' values during accessing regs.

Signed-off-by: Lan Tianyu 
---
 drivers/net/ethernet/intel/ixgbevf/Makefile|  3 ++-
 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c  |  2 +-
 .../net/ethernet/intel/ixgbevf/self-emulation.c| 26 ++
 drivers/net/ethernet/intel/ixgbevf/vf.h|  5 -
 4 files changed, 33 insertions(+), 3 deletions(-)
 create mode 100644 drivers/net/ethernet/intel/ixgbevf/self-emulation.c

diff --git a/drivers/net/ethernet/intel/ixgbevf/Makefile 
b/drivers/net/ethernet/intel/ixgbevf/Makefile
index 4ce4c97..841c884 100644
--- a/drivers/net/ethernet/intel/ixgbevf/Makefile
+++ b/drivers/net/ethernet/intel/ixgbevf/Makefile
@@ -31,7 +31,8 @@
 
 obj-$(CONFIG_IXGBEVF) += ixgbevf.o
 
-ixgbevf-objs := vf.o \
+ixgbevf-objs := self-emulation.o \
+   vf.o \
 mbx.o \
 ethtool.o \
 ixgbevf_main.o
diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c 
b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
index a16d267..4446916 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
@@ -156,7 +156,7 @@ u32 ixgbevf_read_reg(struct ixgbe_hw *hw, u32 reg)
 
if (IXGBE_REMOVED(reg_addr))
return IXGBE_FAILED_READ_REG;
-   value = readl(reg_addr + reg);
+   value = ixgbe_self_emul_readl(reg_addr, reg);
if (unlikely(value == IXGBE_FAILED_READ_REG))
ixgbevf_check_remove(hw, reg);
return value;
diff --git a/drivers/net/ethernet/intel/ixgbevf/self-emulation.c 
b/drivers/net/ethernet/intel/ixgbevf/self-emulation.c
new file mode 100644
index 000..d74b2da
--- /dev/null
+++ b/drivers/net/ethernet/intel/ixgbevf/self-emulation.c
@@ -0,0 +1,26 @@
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "vf.h"
+#include "ixgbevf.h"
+
+static u32 hw_regs[0x4000];
+
+u32 ixgbe_self_emul_readl(volatile void __iomem *base, u32 addr)
+{
+   u32 tmp;
+
+   tmp = readl(base + addr);
+   hw_regs[(unsigned long)addr] = tmp;
+
+   return tmp;
+}
+
+void ixgbe_self_emul_writel(u32 val, volatile void __iomem *base, u32  addr)
+{
+   hw_regs[(unsigned long)addr] = val;
+   writel(val, (volatile void __iomem *)(base + addr));
+}
diff --git a/drivers/net/ethernet/intel/ixgbevf/vf.h 
b/drivers/net/ethernet/intel/ixgbevf/vf.h
index d40f036..6a3f4eb 100644
--- a/drivers/net/ethernet/intel/ixgbevf/vf.h
+++ b/drivers/net/ethernet/intel/ixgbevf/vf.h
@@ -39,6 +39,9 @@
 
 struct ixgbe_hw;
 
+u32 ixgbe_self_emul_readl(volatile void __iomem *base, u32 addr);
+void ixgbe_self_emul_writel(u32 val, volatile void __iomem *base, u32  addr);
+
 /* iterator type for walking multicast address lists */
 typedef u8* (*ixgbe_mc_addr_itr) (struct ixgbe_hw *hw, u8 **mc_addr_ptr,
  u32 *vmdq);
@@ -182,7 +185,7 @@ static inline void ixgbe_write_reg(struct ixgbe_hw *hw, u32 
reg, u32 value)
 
if (IXGBE_REMOVED(reg_addr))
return;
-   writel(value, reg_addr + reg);
+   ixgbe_self_emul_writel(value, reg_addr, reg);
 }
 
 #define IXGBE_WRITE_REG(h, r, v) ixgbe_write_reg(h, r, v)
-- 
1.8.4.rc0.1.g8f6a3e5.dirty

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC Patch 10/12] IXGBEVF: Add lock to protect tx/rx ring operation

2015-10-21 Thread Lan Tianyu

Ring shifting during restoring VF function maybe race with original
ring operation(transmit/receive package). This patch is to add tx/rx
lock to protect ring related data.

Signed-off-by: Lan Tianyu 
---
 drivers/net/ethernet/intel/ixgbevf/ixgbevf.h  |  2 ++
 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 28 ---
 2 files changed, 27 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h 
b/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h
index 6eab402e..3a748c8 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h
@@ -448,6 +448,8 @@ struct ixgbevf_adapter {
 
spinlock_t mbx_lock;
unsigned long last_reset;
+   spinlock_t mg_rx_lock;
+   spinlock_t mg_tx_lock;
 };
 
 enum ixbgevf_state_t {
diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c 
b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
index 15ec361..04b6ce7 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
@@ -227,8 +227,10 @@ static u64 ixgbevf_get_tx_completed(struct ixgbevf_ring 
*ring)
 
 int ixgbevf_tx_ring_shift(struct ixgbevf_ring *r, u32 head)
 {
+   struct ixgbevf_adapter *adapter = netdev_priv(r->netdev);
struct ixgbevf_tx_buffer *tx_buffer = NULL;
static union ixgbevf_desc *tx_desc = NULL;
+   unsigned long flags;
 
tx_buffer = vmalloc(sizeof(struct ixgbevf_tx_buffer) * (r->count));
if (!tx_buffer)
@@ -238,6 +240,7 @@ int ixgbevf_tx_ring_shift(struct ixgbevf_ring *r, u32 head)
if (!tx_desc)
return -ENOMEM;
 
+   spin_lock_irqsave(>mg_tx_lock, flags);
memcpy(tx_desc, r->desc, sizeof(union ixgbevf_desc) * r->count);
memcpy(r->desc, _desc[head], sizeof(union ixgbevf_desc) * (r->count 
- head));
memcpy(>desc[r->count - head], tx_desc, sizeof(union ixgbevf_desc) * 
head);
@@ -256,6 +259,8 @@ int ixgbevf_tx_ring_shift(struct ixgbevf_ring *r, u32 head)
else
r->next_to_use += (r->count - head);
 
+   spin_unlock_irqrestore(>mg_tx_lock, flags);
+
vfree(tx_buffer);
vfree(tx_desc);
return 0;
@@ -263,8 +268,10 @@ int ixgbevf_tx_ring_shift(struct ixgbevf_ring *r, u32 head)
 
 int ixgbevf_rx_ring_shift(struct ixgbevf_ring *r, u32 head)
 {
+   struct ixgbevf_adapter *adapter = netdev_priv(r->netdev);
struct ixgbevf_rx_buffer *rx_buffer = NULL;
static union ixgbevf_desc *rx_desc = NULL;
+   unsigned long flags;
 
rx_buffer = vmalloc(sizeof(struct ixgbevf_rx_buffer) * (r->count));
if (!rx_buffer)
@@ -274,6 +281,7 @@ int ixgbevf_rx_ring_shift(struct ixgbevf_ring *r, u32 head)
if (!rx_desc)
return -ENOMEM;
 
+   spin_lock_irqsave(>mg_rx_lock, flags);
memcpy(rx_desc, r->desc, sizeof(union ixgbevf_desc) * (r->count));
memcpy(r->desc, _desc[head], sizeof(union ixgbevf_desc) * (r->count 
- head));
memcpy(>desc[r->count - head], rx_desc, sizeof(union ixgbevf_desc) * 
head);
@@ -291,6 +299,7 @@ int ixgbevf_rx_ring_shift(struct ixgbevf_ring *r, u32 head)
r->next_to_use -= head;
else
r->next_to_use += (r->count - head);
+   spin_unlock_irqrestore(>mg_rx_lock, flags);
 
vfree(rx_buffer);
vfree(rx_desc);
@@ -377,6 +386,8 @@ static bool ixgbevf_clean_tx_irq(struct ixgbevf_q_vector 
*q_vector,
if (test_bit(__IXGBEVF_DOWN, >state))
return true;
 
+   spin_lock(>mg_tx_lock);
+   i = tx_ring->next_to_clean;
tx_buffer = _ring->tx_buffer_info[i];
tx_desc = IXGBEVF_TX_DESC(tx_ring, i);
i -= tx_ring->count;
@@ -471,6 +482,8 @@ static bool ixgbevf_clean_tx_irq(struct ixgbevf_q_vector 
*q_vector,
q_vector->tx.total_bytes += total_bytes;
q_vector->tx.total_packets += total_packets;
 
+   spin_unlock(>mg_tx_lock);
+
if (check_for_tx_hang(tx_ring) && ixgbevf_check_tx_hang(tx_ring)) {
struct ixgbe_hw *hw = >hw;
union ixgbe_adv_tx_desc *eop_desc;
@@ -999,10 +1012,12 @@ static int ixgbevf_clean_rx_irq(struct ixgbevf_q_vector 
*q_vector,
struct ixgbevf_ring *rx_ring,
int budget)
 {
+   struct ixgbevf_adapter *adapter = netdev_priv(rx_ring->netdev);
unsigned int total_rx_bytes = 0, total_rx_packets = 0;
u16 cleaned_count = ixgbevf_desc_unused(rx_ring);
struct sk_buff *skb = rx_ring->skb;
 
+   spin_lock(>mg_rx_lock);
while (likely(total_rx_packets < budget)) {
union ixgbe_adv_rx_desc *rx_desc;
 
@@ -1078,6 +1093,7 @@ static int ixgbevf_clean_rx_irq(struct ixgbevf_q_vector 
*q_vector,
q_vector->rx.total_packets += total_rx_packets;
q_vector->rx.total_bytes += total_rx_bytes;
 
+

[RFC Patch 12/12] IXGBEVF: Track dma dirty pages

2015-10-21 Thread Lan Tianyu

Migration relies on tracking dirty page to migrate memory.
Hardware can't automatically mark a page as dirty after DMA
memory access. VF descriptor rings and data buffers are modified
by hardware when receive and transmit data. To track such dirty memory
manually, do dummy writes(read a byte and write it back) during receive
and transmit data.

Signed-off-by: Lan Tianyu 
---
 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 14 +++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c 
b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
index d22160f..ce7bd7a 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
@@ -414,6 +414,9 @@ static bool ixgbevf_clean_tx_irq(struct ixgbevf_q_vector 
*q_vector,
if (!(eop_desc->wb.status & cpu_to_le32(IXGBE_TXD_STAT_DD)))
break;
 
+   /* write back status to mark page dirty */
+   eop_desc->wb.status = eop_desc->wb.status;
+
/* clear next_to_watch to prevent false hangs */
tx_buffer->next_to_watch = NULL;
tx_buffer->desc_num = 0;
@@ -946,15 +949,17 @@ static struct sk_buff *ixgbevf_fetch_rx_buffer(struct 
ixgbevf_ring *rx_ring,
 {
struct ixgbevf_rx_buffer *rx_buffer;
struct page *page;
+   u8 *page_addr;
 
rx_buffer = _ring->rx_buffer_info[rx_ring->next_to_clean];
page = rx_buffer->page;
prefetchw(page);
 
-   if (likely(!skb)) {
-   void *page_addr = page_address(page) +
- rx_buffer->page_offset;
+   /* Mark page dirty */
+   page_addr = page_address(page) + rx_buffer->page_offset;
+   *page_addr = *page_addr;
 
+   if (likely(!skb)) {
/* prefetch first cache line of first page */
prefetch(page_addr);
 #if L1_CACHE_BYTES < 128
@@ -1032,6 +1037,9 @@ static int ixgbevf_clean_rx_irq(struct ixgbevf_q_vector 
*q_vector,
if (!ixgbevf_test_staterr(rx_desc, IXGBE_RXD_STAT_DD))
break;
 
+   /* Write back status to mark page dirty */
+   rx_desc->wb.upper.status_error = rx_desc->wb.upper.status_error;
+
/* This memory barrier is needed to keep us from reading
 * any other fields out of the rx_desc until we know the
 * RXD_STAT_DD bit is set
-- 
1.8.4.rc0.1.g8f6a3e5.dirty

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next 3/6] arcnet: com20020-pci: set dev_port to the subdevice index

2015-10-21 Thread Michael Grzeschik

This patch sets the dev_port according to the index of
the card. This can be used by udev to name the ports
in userspace.

Signed-off-by: Michael Grzeschik 
---
 drivers/net/arcnet/com20020-pci.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/arcnet/com20020-pci.c 
b/drivers/net/arcnet/com20020-pci.c
index a12bf83..e3b7c14e 100644
--- a/drivers/net/arcnet/com20020-pci.c
+++ b/drivers/net/arcnet/com20020-pci.c
@@ -96,6 +96,7 @@ static int com20020pci_probe(struct pci_dev *pdev,
ret = -ENOMEM;
goto out_port;
}
+   dev->dev_port = i;
 
dev->netdev_ops = _netdev_ops;
 
-- 
2.6.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next 6/6] arcnet: add netif_carrier_on/off for reconnect

2015-10-21 Thread Michael Grzeschik

The arcnet device has no interrupt to detect if the link has changed
from disconnected to connected. This patch adds an timer to toggle the
link detection. The timer will get retriggered as long as the
reconnection interrupts accure. If the recon interrupts hold off
for >1s we define the connection stable again.

Signed-off-by: Michael Grzeschik 
---
 drivers/net/arcnet/arcdevice.h |  2 ++
 drivers/net/arcnet/arcnet.c| 25 +
 2 files changed, 27 insertions(+)

diff --git a/drivers/net/arcnet/arcdevice.h b/drivers/net/arcnet/arcdevice.h
index 2edc0c0..20bfb9b 100644
--- a/drivers/net/arcnet/arcdevice.h
+++ b/drivers/net/arcnet/arcdevice.h
@@ -267,6 +267,8 @@ struct arcnet_local {
struct led_trigger *recon_led_trig;
char recon_led_trig_name[ARCNET_LED_NAME_SZ];
 
+   struct timer_list   timer;
+
/*
 * Buffer management: an ARCnet card has 4 x 512-byte buffers, each of
 * which can be used for either sending or receiving.  The new dynamic
diff --git a/drivers/net/arcnet/arcnet.c b/drivers/net/arcnet/arcnet.c
index 4242522..6ea963e 100644
--- a/drivers/net/arcnet/arcnet.c
+++ b/drivers/net/arcnet/arcnet.c
@@ -381,6 +381,16 @@ static void arcdev_setup(struct net_device *dev)
dev->flags = IFF_BROADCAST;
 }
 
+static void arcnet_timer(unsigned long data)
+{
+   struct net_device *dev = (struct net_device *)data;
+
+   if (!netif_carrier_ok(dev)) {
+   netif_carrier_on(dev);
+   netdev_info(dev, "link up\n");
+   }
+}
+
 struct net_device *alloc_arcdev(const char *name)
 {
struct net_device *dev;
@@ -392,6 +402,9 @@ struct net_device *alloc_arcdev(const char *name)
struct arcnet_local *lp = netdev_priv(dev);
 
spin_lock_init(>lock);
+   init_timer(>timer);
+   lp->timer.data = (unsigned long) dev;
+   lp->timer.function = arcnet_timer;
}
 
return dev;
@@ -490,7 +503,9 @@ int arcnet_open(struct net_device *dev)
lp->hw.intmask(dev, lp->intmask);
arc_printk(D_DEBUG, dev, "%s: %d: %s\n", __FILE__, __LINE__, __func__);
 
+   netif_carrier_off(dev);
netif_start_queue(dev);
+   mod_timer(>timer, jiffies + msecs_to_jiffies(1000));
 
arcnet_led_event(dev, ARCNET_LED_EVENT_OPEN);
return 0;
@@ -507,7 +522,10 @@ int arcnet_close(struct net_device *dev)
struct arcnet_local *lp = netdev_priv(dev);
 
arcnet_led_event(dev, ARCNET_LED_EVENT_STOP);
+   del_timer_sync(>timer);
+
netif_stop_queue(dev);
+   netif_carrier_off(dev);
 
/* flush TX and disable RX */
lp->hw.intmask(dev, 0);
@@ -908,6 +926,12 @@ irqreturn_t arcnet_interrupt(int irq, void *dev_id)
 
arc_printk(D_RECON, dev, "Network reconfiguration 
detected (status=%Xh)\n",
   status);
+   if (netif_carrier_ok(dev)) {
+   netif_carrier_off(dev);
+   netdev_info(dev, "link down\n");
+   }
+   mod_timer(>timer, jiffies + msecs_to_jiffies(1000));
+
arcnet_led_event(dev, ARCNET_LED_EVENT_RECON);
/* MYRECON bit is at bit 7 of diagstatus */
if (diagstatus & 0x80)
@@ -959,6 +983,7 @@ irqreturn_t arcnet_interrupt(int irq, void *dev_id)
lp->num_recons = lp->network_down = 0;
 
arc_printk(D_DURING, dev, "not recon: clearing counters 
anyway.\n");
+   netif_carrier_on(dev);
}
 
if (didsomething)
-- 
2.6.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next 2/6] arcnet: com20020: add enable and disable device on open/close

2015-10-21 Thread Michael Grzeschik

This patch changes the driver to properly work with the linux netif
interface. The controller gets enabled on open and disabled on close.
Therefor it removes every bogus start of the xceiver. It only gets
enabled on com20020_open and disabled on com20020_close.

Signed-off-by: Michael Grzeschik 
---
 drivers/net/arcnet/com20020.c | 39 +--
 1 file changed, 29 insertions(+), 10 deletions(-)

diff --git a/drivers/net/arcnet/com20020.c b/drivers/net/arcnet/com20020.c
index c82f323..436951b 100644
--- a/drivers/net/arcnet/com20020.c
+++ b/drivers/net/arcnet/com20020.c
@@ -118,7 +118,7 @@ int com20020_check(struct net_device *dev)
arcnet_outb(STARTIOcmd, ioaddr, COM20020_REG_W_COMMAND);
}
 
-   lp->config = TXENcfg | (lp->timeout << 3) | (lp->backplane << 2) | 
SUB_NODE;
+   lp->config = (lp->timeout << 3) | (lp->backplane << 2) | SUB_NODE;
/* set node ID to 0x42 (but transmitter is disabled, so it's okay) */
arcnet_outb(lp->config, ioaddr, COM20020_REG_W_CONFIG);
arcnet_outb(0x42, ioaddr, COM20020_REG_W_XREG);
@@ -131,11 +131,6 @@ int com20020_check(struct net_device *dev)
}
arc_printk(D_INIT_REASONS, dev, "status after reset: %X\n", status);
 
-   /* Enable TX */
-   lp->config |= TXENcfg;
-   arcnet_outb(lp->config, ioaddr, COM20020_REG_W_CONFIG);
-   arcnet_outb(arcnet_inb(ioaddr, 8), ioaddr, COM20020_REG_W_XREG);
-
arcnet_outb(CFLAGScmd | RESETclear | CONFIGclear,
ioaddr, COM20020_REG_W_COMMAND);
status = arcnet_inb(ioaddr, COM20020_REG_R_STATUS);
@@ -169,9 +164,33 @@ static int com20020_set_hwaddr(struct net_device *dev, 
void *addr)
return 0;
 }
 
+int com20020_netdev_open(struct net_device *dev)
+{
+   int ioaddr = dev->base_addr;
+   struct arcnet_local *lp = netdev_priv(dev);
+
+   lp->config |= TXENcfg;
+   arcnet_outb(lp->config, ioaddr, COM20020_REG_W_CONFIG);
+
+   return arcnet_open(dev);
+}
+
+int com20020_netdev_close(struct net_device *dev)
+{
+   int ioaddr = dev->base_addr;
+   struct arcnet_local *lp = netdev_priv(dev);
+
+   arcnet_close(dev);
+
+   /* disable transmitter */
+   lp->config &= ~TXENcfg;
+   arcnet_outb(lp->config, ioaddr, COM20020_REG_W_CONFIG);
+   return 0;
+}
+
 const struct net_device_ops com20020_netdev_ops = {
-   .ndo_open   = arcnet_open,
-   .ndo_stop   = arcnet_close,
+   .ndo_open   = com20020_netdev_open,
+   .ndo_stop   = com20020_netdev_close,
.ndo_start_xmit = arcnet_send_packet,
.ndo_tx_timeout = arcnet_timeout,
.ndo_set_mac_address = com20020_set_hwaddr,
@@ -215,7 +234,7 @@ int com20020_found(struct net_device *dev, int shared)
arcnet_outb(STARTIOcmd, ioaddr, COM20020_REG_W_COMMAND);
}
 
-   lp->config = TXENcfg | (lp->timeout << 3) | (lp->backplane << 2) | 
SUB_NODE;
+   lp->config = (lp->timeout << 3) | (lp->backplane << 2) | SUB_NODE;
/* Default 0x38 + register: Node ID */
arcnet_outb(lp->config, ioaddr, COM20020_REG_W_CONFIG);
arcnet_outb(dev->dev_addr[0], ioaddr, COM20020_REG_W_XREG);
@@ -274,7 +293,7 @@ static int com20020_reset(struct net_device *dev, int 
really_reset)
   dev->name, arcnet_inb(ioaddr, COM20020_REG_R_STATUS));
 
arc_printk(D_DEBUG, dev, "%s: %d: %s\n", __FILE__, __LINE__, __func__);
-   lp->config = TXENcfg | (lp->timeout << 3) | (lp->backplane << 2);
+   lp->config |= (lp->timeout << 3) | (lp->backplane << 2);
/* power-up defaults */
arcnet_outb(lp->config, ioaddr, COM20020_REG_W_CONFIG);
arc_printk(D_DEBUG, dev, "%s: %d: %s\n", __FILE__, __LINE__, __func__);
-- 
2.6.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[GIT PULL] ARCNET: code simplification and features

2015-10-21 Thread Michael Grzeschik

The following changes since commit 6ac311ae8bfb47de09f349e781e26373944d2ee3:
  
  Adding switchdev ageing notification on port bridged (2015-10-21 07:50:57 
-0700)

are available in the git repository at:
  
  ssh+git://git.pengutronix.de/git/mgr/linux.git tags/arcnet-for-4.4-rc1

for you to fetch changes up to d68c66a32f8176c27b9a395d07ba8a8e374f6111:
  
  arcnet: add netif_carrier_on/off for reconnect (2015-10-21 17:27:59 +0200)


This series includes code simplifaction. The main changes are the correct
xceiver handling (enable/disable) of the com20020 cards. The driver now handles
link status change detection. The EAE PCI-ARCNET cards now make use of the
rotary encoded subdevice indexing and got support for led triggers on transmit
and reconnection events.


Michael Grzeschik (6):
  arcnet: move dev_free_skb to its only user
  arcnet: com20020: add enable and disable device on open/close
  arcnet: com20020-pci: set dev_port to the subdevice index
  arcnet: com20020-pci: add rotary index support
  arcnet: com20020-pci: add led trigger support
  arcnet: add netif_carrier_on/off for reconnect
 
 drivers/net/arcnet/arcdevice.h|  21 
 drivers/net/arcnet/arcnet.c   | 107 
+++
 drivers/net/arcnet/com20020-pci.c | 107 
+++
 drivers/net/arcnet/com20020.c |  39 ++--
 drivers/net/arcnet/com20020.h |  14 +
 5 files changed, 270 insertions(+), 18 deletions(-)

Michael Grzeschik (6):
  arcnet: move dev_free_skb to its only user
  arcnet: com20020: add enable and disable device on open/close
  arcnet: com20020-pci: set dev_port to the subdevice index
  arcnet: com20020-pci: add rotary index support
  arcnet: com20020-pci: add led trigger support
  arcnet: add netif_carrier_on/off for reconnect

 drivers/net/arcnet/arcdevice.h|  21 
 drivers/net/arcnet/arcnet.c   | 107 +++---
 drivers/net/arcnet/com20020-pci.c | 107 ++
 drivers/net/arcnet/com20020.c |  39 ++
 drivers/net/arcnet/com20020.h |  14 +
 5 files changed, 270 insertions(+), 18 deletions(-)

-- 
2.6.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

1 2 3 4 >

1 - 100 of 302 matches

Mail list logo