Re: [PATCH net] net: try harder to not reuse ifindex when moving interfaces

2015-10-22 Thread Hannes Frederic Sowa
Hi Thomas,

On Thu, Oct 22, 2015, at 18:45, Thomas Graf wrote:
> On 10/22/15 at 05:00pm, Jiri Benc wrote:
> > On Thu, 22 Oct 2015 16:52:13 +0200, Nicolas Dichtel wrote:
> > > With the proposed scenario:
> > > 1. create netns 'new_netns'
> > > 2. in root netns, move the interface with ifindex 2 to new_netns
> > > 3. in new_netns, delete the interface with ifindex 2
> > > 4. in new_netns, create an interface - it will get ifindex 2
> > > 
> > > Operation 2 and 4 are done by dev_change_net_namespace() under 
> > > rtnl_lock().
> > > RTM_DELLINK(root netns) and RTM_NEWLINK(new_netns) are sent by this 
> > > function.
> > > It means that operation 3 has been done before and that 
> > > RTM_DELLINK(new_netns)
> > > has been sent before.
> > 
> > Imagine the application trying to configure the interface with ifindex 2
> > after your step 2. It constructs a netlink message and sends it to the
> > kernel; but while doing so, steps 3 and 4 happen. Now the application
> > ends up configuring a different interface than it intended to. After
> > that, it polls the netlink socket and receives the notifications about
> > interface disappearing and a new one appearing.
> > 
> > I don't see any way the user space application can prevent this. There
> > will always be a race between receiving netlink notifications and
> > sending config requests.
> > 
> > I guess Thomas Haller can elaborate more as he ran into this.
> 
> I understand the race but when does it occur? Whoever creates
> the original interface owns it and is responsible for its
> lifecycle. *Iff* for some reason multiple entities manipulate
> the interface, then it's probably a lot safer to just use flock
> or something similar to serialize access entirely in user space.

This only works if all networking configuration programs would
standardize on the same flock. Also, under memory pressure we lose
netlink monitor messages, so we need to deal with timeouts and retries
and manual sync up on the networking configuration, which makes this
scheme a lot harder. For normal socket io, where we specify e.g. ifindex
in sin6_addr, this is not really usable at all.

Bye,
Hannes
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 3/4] bpf: add support for persistent maps/progs

2015-10-22 Thread Eric W. Biederman
Daniel Borkmann  writes:

> On 10/20/2015 08:56 PM, Eric W. Biederman wrote:
> ...
>> Just FYI:  Using a device for this kind of interface is pretty
>> much a non-starter as that quickly gets you into situations where
>> things do not work in containers.  If someone gets a version of device
>> namespaces past GregKH it might be up for discussion to use character
>> devices.
>
> Okay, you are referring to this discussion here:
>
>   http://thread.gmane.org/gmane.linux.kernel.containers/26760

That is a piece of it.  It is an old old discussion (which generally has
been handled poorly).  For the forseeable future device namespaces have
a firm NACK by GregKH.  Which means that dynamic character device based
interfaces do not work in containers.  Which means if you are not
talking about physical hardware, character devices are a poor fit.

Making a character based interface for eBPF not workable.

Eric

p.s.  There are plenty of reasons (even if privilege remains a
  requirement) to ask how can this functionality be used in a
  container.  If for no other reason than sandboxing privileged
  applications is typically a good idea.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fw: [Bug 106241] New: shutdown(3)/close(3) behaviour is incorrect for sockets in accept(3)

2015-10-22 Thread Al Viro
On Thu, Oct 22, 2015 at 11:55:42AM +0100, Alan Burlison wrote:
> On 22/10/2015 05:21, Al Viro wrote:
> 
> >>Most of the work on using a file descriptor is local to the thread.
> >
> >Using - sure, but what of cacheline dirtied every time you resolve a
> >descriptor to file reference?
> 
> Don't you have to do that anyway, to do anything useful with the file?

Dirtying the cacheline that contains struct file itself is different, but
that's not per-descriptor.

> >In case of Linux we have two bitmaps and an array of pointers associated
> >with descriptor table.  They grow on demand (in parallel)
> > * reserving a descriptor is done under ->file_lock (dropped/regained
> >around memory allocation if we end up expanding the sucker, actual 
> >reassignment
> >of pointers to array/bitmaps is under that spinlock)
> > * installing a pointer is lockless (we wait for ongoing resize to
> >settle, RCU takes care of the rest)
> > * grabbing a file by index is lockless as well
> > * removing a pointer is under ->file_lock, so's replacing it by dup2().
> 
> Is that table per-process or global?

Usually it's per-process, but any thread could ask for a private instance
to work with (and then spawn more threads sharing that instance - or getting
independent copies).

It's common for Plan 9-inspired models - basically, you treat every thread
as a machine that consists of
* memory
* file descriptor table
* namespace
* signal handlers
...
* CPU (i.e. actual thread of execution).
The last part can't be shared; anything else can.  fork(2) variant used to
start new threads (clone(2) in case of Linux, rfork(2) in Plan 9 and *BSD)
is told which components should be copies of parent's ones and which should
be shared with the parent.  fork(2) is simply "copy everything except for the
namespace".  It's fairly common to have "share everything", but intermediate
variants are also possible.  There are constraints (e.g. you can't share
signal handlers without sharing the memory space), but descriptor table
can be shared independently from memory space just fine.  There's also a
way to say "unshare this, this and that components" - mapped to unshare(2) in
Linux and to rfork(2) in Plan 9.

Best way to think of that is to consider descriptor table as a first-class
object a thread can be connected to.  Usually you have one for each process,
with all threads belonging to that process connected to the same thing,
but that's just the most common use.

> I don't think that it's possible to claim that a non-atomic dup2()
> is POSIX-compliant.

Except that it's in non-normative part of dup2(2), AFAICS.  I certainly
agree that it would be a standard lawyering beyond reason, but "not
possible to claim" is too optimistic.  Maybe I'm just more cynical...

> ThreadA remains sat in accept on fd1 which is now a plain file, not
> a socket.

No.  accept() is not an operation on file descriptors; it's an operation on
file descriptions (pardon for use of that terminology).  They are specified
by passing descriptors, but there's a hell of a difference between e.g.
dup() or fcntl(,F_SETFD,) (operations on descriptors) and read() or lseek()
(operations on descriptions).

Lookups are done once per syscall; the only exception is F_SETFL{,W}, where
we recheck that descriptor is refering to the same thing before granting
the lock.

Again, POSIX is still underspecifying the semantics of shared descriptor
tables; back when the bulk of it had been written there had been no way
to have a descriptor -> description mapping changed under a syscall by
action of another thread.  Hell, they still hadn't picked on some things
that happened in early 80s, let alone early-to-mid 90s...

Linux and Solaris happen to cover these gaps differently; FreeBSD and
OpenBSD are probably closer to Linux variant, NetBSD - to Solaris one.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 14/15] net: wireless: ath: Remove unneeded variable ret returning 0

2015-10-22 Thread Punit Vara
This patch is to the ath5k/eeprom.c that fixes up warning caught by
coccicheck:

-Unneeded variable: "ret". Return "0" on line 1733

Remove unneccesary variable ret created to return zero.

Signed-off-by: Punit Vara 
---
 drivers/net/wireless/ath/ath5k/eeprom.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/net/wireless/ath/ath5k/eeprom.c 
b/drivers/net/wireless/ath/ath5k/eeprom.c
index 94d34ee..0c82ea5 100644
--- a/drivers/net/wireless/ath/ath5k/eeprom.c
+++ b/drivers/net/wireless/ath/ath5k/eeprom.c
@@ -1707,8 +1707,7 @@ ath5k_eeprom_read_spur_chans(struct ath5k_hw *ah)
struct ath5k_eeprom_info *ee = >ah_capabilities.cap_eeprom;
u32 offset;
u16 val;
-   int ret = 0, i;
-
+   int i;
offset = AR5K_EEPROM_CTL(ee->ee_version) +
AR5K_EEPROM_N_CTLS(ee->ee_version);
 
@@ -1730,7 +1729,7 @@ ath5k_eeprom_read_spur_chans(struct ath5k_hw *ah)
}
}
 
-   return ret;
+   return 0;
 }
 
 
-- 
2.5.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bug 106241] New: shutdown(3)/close(3) behaviour is incorrect for sockets in accept(3)

2015-10-22 Thread Casper . Dik

>On Thu, Oct 22, 2015 at 08:34:19AM +0200, casper@oracle.com wrote:
>> 
>> 
>> >And I'm really curious about the things Solaris would do with dup2() there.
>> >Does it take into account the possibility of new accept() coming just as
>> >dup2() is trying to terminate the ongoing ones?  Is there a window when
>> >descriptor-to-file lookups would fail?  Looks like a race/deadlock 
>> >country...
>> 
>> Solaris does not "terminate" threads, instead it tells them that the
>> file descriptor information used is stale and wkae's up the thread.
>
>Sorry, lousy wording - I meant "terminate syscall in another thread".
>Better yet, make that "what happens if new accept(newfd) comes while dup2()
>waits for affected syscalls in other threads to finish"?  Assuming it
>does wait, that is..

No there is no such window; the accept() call either returns EBADF
(dup2()) wins the race or it returns a new file descriptor (and dup2()
then closes the listening descriptor).

One or the other.

>While we are at it, what's the relative order of record locks removal
>and switching the meaning of newfd?  In our kernel it happens *after*
>the switchover (i.e. if another thread is waiting for a record lock held on
>any alias of newfd and we do dup2(oldfd, newfd), the waiter will not see
>the state with newfd still refering to what it used to; note that waiter
>might be using any descriptor refering to the file newfd used to refer
>to, so it won't be affected by the "wake those who had the meaning of
>their arguments change" side of things).

The external behaviour atomic; you cannot distinguish the order
between the closing of the original file (and waking up other threads
waiting for a record lock) or changing the file referenced by that newfd.

But this not include a global or process specific lock.

Casper

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/8] net: consolidate memcg socket buffer tracking and accounting

2015-10-22 Thread Johannes Weiner
On Thu, Oct 22, 2015 at 09:46:12PM +0300, Vladimir Davydov wrote:
> On Thu, Oct 22, 2015 at 12:21:31AM -0400, Johannes Weiner wrote:
> > The tcp memory controller has extensive provisions for future memory
> > accounting interfaces that won't materialize after all. Cut the code
> > base down to what's actually used, now and in the likely future.
> > 
> > - There won't be any different protocol counters in the future, so a
> >   direct sock->sk_memcg linkage is enough. This eliminates a lot of
> >   callback maze and boilerplate code, and restores most of the socket
> >   allocation code to pre-tcp_memcontrol state.
> > 
> > - There won't be a tcp control soft limit, so integrating the memcg
> 
> In fact, the code is ready for the "soft" limit (I mean min, pressure,
> max tuple), it just lacks a knob.

Yeah, but that's not going to materialize if the entire interface for
dedicated tcp throttling is considered obsolete.

> > @@ -1136,9 +1090,6 @@ static inline bool sk_under_memory_pressure(const 
> > struct sock *sk)
> > if (!sk->sk_prot->memory_pressure)
> > return false;
> >  
> > -   if (mem_cgroup_sockets_enabled && sk->sk_cgrp)
> > -   return !!sk->sk_cgrp->memory_pressure;
> > -
> 
> AFAIU, now we won't shrink the window on hitting the limit, i.e. this
> patch subtly changes the behavior of the existing knobs, potentially
> breaking them.

Hm, but there is no grace period in which something meaningful could
happen with the window shrinking, is there? Any buffer allocation is
still going to fail hard.

I don't see how this would change anything in practice.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[BUG] bnx2x_config_vlan_mac called a NULL function pointer

2015-10-22 Thread Otto Sabart
Hello netdev,
I probably found a bug in kernel-4.3.0-0.rc5 (bnx2x driver). So I opened
new bug report in our bugzilla [0]. Michal Schmidt told me the best way
to solve an upstream bug is to contact you directly to netdev list.. so
here I am :).

Can somebody take a look at it?

[0] https://bugzilla.redhat.com/show_bug.cgi?id=1273894


thanks,

Ota

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v3 1/2] iwlwifi: pcie: allow to build an A-MSDU using TSO core

2015-10-22 Thread Grumbach, Emmanuel


On 10/22/2015 05:27 AM, Eric Dumazet wrote:
> On Thu, 2015-10-22 at 00:14 +, Grumbach, Emmanuel wrote:
> 
>>
>> Well. I guess I should at least check, but even with very small MSS, our
>> device supports up to 20 pointers for the same 802.11 packet: 2 are for
>> metadata. So basically, so leaves me only 18 pointers. for each MSS I
>> need at least 2 (one for the headers and one for the payload), so I will
>> have at most 9 of these for one packet, even with a tiny MSS.
>>
> 
> I did not see in your patch where you made the checks about 18 segs in a
> TSO packet ?

It is in the other patch: iwlwifi: mvm: send large SKBs to the transport
mvm is the op_mode and the op_mode needs to make sure that the payload
fits in one 802.11 packet AND it doesn't exhaust the number of pointers.
I'll add a comment here.

> 
>> I agree that all this should be added to the code in a comment.
>> Speaking of which...
>> int tso_count_descs(struct sk_buff *skb)
>> {
>> /* The Marvell Way */
>> return skb_shinfo(skb)->gso_segs * 2 + skb_shinfo(skb)->nr_frags;
>> }
>>
>> What if there is some payload in the header?
>> To me it sounds safer to return:
>>
>> skb_shinfo(skb)->gso_segs * 2 + skb_shinfo(skb)->nr_frags + 1;
>>
>> or maybe to test if there is some payload in the header and then add 1?
>> If there is payload in the header, it should be considered as another
>> frag, shouldn't it?
> 
> Minimal count is gso_segs (one per MSS)
> 
> Then you have to add extra for the cases we have a mss spanning a frag
> in skb.
> 
> Thats a max of (skb_shinfo(skb)->nr_frags - 1) + (data_in_head() ? 1 :
> 0);
> So I believe formula would be correct.
> 

I needed a piece of paper and a few drawings to understand you are
right... :)
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/4] xfrm: Fix pmtu discovery for local generated packets.

2015-10-22 Thread Steffen Klassert
Commit 044a832a777 ("xfrm: Fix local error reporting crash
with interfamily tunnels") moved the setting of skb->protocol
behind the last access of the inner mode family to fix an
interfamily crash. Unfortunately now skb->protocol might not
be set at all, so we fail dispatch to the inner address family.
As a reault, the local error handler is not called and the
mtu value is not reported back to userspace.

We fix this by setting skb->protocol on message size errors
before we call xfrm_local_error.

Fixes: 044a832a7779c ("xfrm: Fix local error reporting crash with interfamily 
tunnels")
Signed-off-by: Steffen Klassert 
---
 net/ipv4/xfrm4_output.c | 2 ++
 net/ipv6/xfrm6_output.c | 1 +
 2 files changed, 3 insertions(+)

diff --git a/net/ipv4/xfrm4_output.c b/net/ipv4/xfrm4_output.c
index 2878dbf..41a2613 100644
--- a/net/ipv4/xfrm4_output.c
+++ b/net/ipv4/xfrm4_output.c
@@ -30,6 +30,8 @@ static int xfrm4_tunnel_check_size(struct sk_buff *skb)
 
mtu = dst_mtu(skb_dst(skb));
if (skb->len > mtu) {
+   skb->protocol = htons(ETH_P_IP);
+
if (skb->sk)
xfrm_local_error(skb, mtu);
else
diff --git a/net/ipv6/xfrm6_output.c b/net/ipv6/xfrm6_output.c
index be033f2..e15feb7 100644
--- a/net/ipv6/xfrm6_output.c
+++ b/net/ipv6/xfrm6_output.c
@@ -79,6 +79,7 @@ static int xfrm6_tunnel_check_size(struct sk_buff *skb)
 
if (!skb->ignore_df && skb->len > mtu) {
skb->dev = dst->dev;
+   skb->protocol = htons(ETH_P_IPV6);
 
if (xfrm6_local_dontfrag(skb))
xfrm6_local_rxpmtu(skb, mtu);
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/4] xfrm: Fix state threshold configuration from userspace

2015-10-22 Thread Steffen Klassert
From: Michael Rossberg 

Allow to change the replay threshold (XFRMA_REPLAY_THRESH) and expiry
timer (XFRMA_ETIMER_THRESH) of a state without having to set other
attributes like replay counter and byte lifetime. Changing these other
values while traffic flows will break the state.

Signed-off-by: Michael Rossberg 
Signed-off-by: Steffen Klassert 
---
 net/xfrm/xfrm_user.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index a8de9e3..24e06a2 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -1928,8 +1928,10 @@ static int xfrm_new_ae(struct sk_buff *skb, struct 
nlmsghdr *nlh,
struct nlattr *rp = attrs[XFRMA_REPLAY_VAL];
struct nlattr *re = attrs[XFRMA_REPLAY_ESN_VAL];
struct nlattr *lt = attrs[XFRMA_LTIME_VAL];
+   struct nlattr *et = attrs[XFRMA_ETIMER_THRESH];
+   struct nlattr *rt = attrs[XFRMA_REPLAY_THRESH];
 
-   if (!lt && !rp && !re)
+   if (!lt && !rp && !re && !et && !rt)
return err;
 
/* pedantic mode - thou shalt sayeth replaceth */
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


pull request (net): ipsec 2015-10-22

2015-10-22 Thread Steffen Klassert
1) Fix IPsec pre-encap fragmentation for GSO packets.
   From Herbert Xu.

2) Fix some header checks in _decode_session6.
   We skip the header informations if the data pointer points
   already behind the header in question for some protocols.
   This is because we call pskb_may_pull with a negative value
   converted to unsigened int from pskb_may_pull in this case.
   Skipping the header informations can lead to incorrect policy
   lookups. From Mathias Krause.

3) Allow to change the replay threshold and expiry timer of a
   state without having to set other attributes like replay
   counter and byte lifetime. Changing these other attributes
   may break the SA. From Michael Rossberg.

4) Fix pmtu discovery for local generated packets.
   We may fail dispatch to the inner address family.
   As a reault, the local error handler is not called
   and the mtu value is not reported back to userspace.

Please pull or let me know if there are problems.

Thanks!

The following changes since commit 724a7636ad026a3a68f3fc626ccd04111f65cfd9:

  Merge branch 'sctp-fixes' (2015-09-03 15:43:06 -0700)

are available in the git repository at:


  git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec.git master

for you to fetch changes up to ca064bd89363a6e7e71b1c5226ff1b718957a9d4:

  xfrm: Fix pmtu discovery for local generated packets. (2015-10-19 10:30:05 
+0200)


Herbert Xu (1):
  ipv6: Fix IPsec pre-encap fragmentation check

Mathias Krause (1):
  xfrm6: Fix ICMPv6 and MH header checks in _decode_session6

Michael Rossberg (1):
  xfrm: Fix state threshold configuration from userspace

Steffen Klassert (1):
  xfrm: Fix pmtu discovery for local generated packets.

 net/ipv4/xfrm4_output.c |  2 ++
 net/ipv6/xfrm6_output.c | 18 --
 net/ipv6/xfrm6_policy.c |  6 --
 net/xfrm/xfrm_user.c|  4 +++-
 4 files changed, 21 insertions(+), 9 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/4] xfrm6: Fix ICMPv6 and MH header checks in _decode_session6

2015-10-22 Thread Steffen Klassert
From: Mathias Krause 

Ensure there's enough data left prior calling pskb_may_pull(). If
skb->data was already advanced, we'll call pskb_may_pull() with a
negative value converted to unsigned int -- leading to a huge
positive value. That won't matter in practice as pskb_may_pull()
will likely fail in this case, but it leads to underflow reports on
kernels handling such kind of over-/underflows, e.g. a PaX enabled
kernel instrumented with the size_overflow plugin.

Reported-by: satmd 
Reported-and-tested-by: Marcin Jurkowski 
Signed-off-by: Mathias Krause 
Cc: PaX Team 
Signed-off-by: Steffen Klassert 
---
 net/ipv6/xfrm6_policy.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/ipv6/xfrm6_policy.c b/net/ipv6/xfrm6_policy.c
index 30caa28..f10b940 100644
--- a/net/ipv6/xfrm6_policy.c
+++ b/net/ipv6/xfrm6_policy.c
@@ -178,7 +178,8 @@ _decode_session6(struct sk_buff *skb, struct flowi *fl, int 
reverse)
return;
 
case IPPROTO_ICMPV6:
-   if (!onlyproto && pskb_may_pull(skb, nh + offset + 2 - 
skb->data)) {
+   if (!onlyproto && (nh + offset + 2 < skb->data ||
+   pskb_may_pull(skb, nh + offset + 2 - skb->data))) {
u8 *icmp;
 
nh = skb_network_header(skb);
@@ -192,7 +193,8 @@ _decode_session6(struct sk_buff *skb, struct flowi *fl, int 
reverse)
 #if IS_ENABLED(CONFIG_IPV6_MIP6)
case IPPROTO_MH:
offset += ipv6_optlen(exthdr);
-   if (!onlyproto && pskb_may_pull(skb, nh + offset + 3 - 
skb->data)) {
+   if (!onlyproto && (nh + offset + 3 < skb->data ||
+   pskb_may_pull(skb, nh + offset + 3 - skb->data))) {
struct ip6_mh *mh;
 
nh = skb_network_header(skb);
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/4] ipv6: Fix IPsec pre-encap fragmentation check

2015-10-22 Thread Steffen Klassert
From: Herbert Xu 

The IPv6 IPsec pre-encap path performs fragmentation for tunnel-mode
packets.  That is, we perform fragmentation pre-encap rather than
post-encap.

A check was added later to ensure that proper MTU information is
passed back for locally generated traffic.  Unfortunately this
check was performed on all IPsec packets, including transport-mode
packets.

What's more, the check failed to take GSO into account.

The end result is that transport-mode GSO packets get dropped at
the check.

This patch fixes it by moving the tunnel mode check forward as well
as adding the GSO check.

Fixes: dd767856a36e ("xfrm6: Don't call icmpv6_send on local error")
Signed-off-by: Herbert Xu 
Signed-off-by: Steffen Klassert 
---
 net/ipv6/xfrm6_output.c | 17 +++--
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/net/ipv6/xfrm6_output.c b/net/ipv6/xfrm6_output.c
index 09c76a7..be033f2 100644
--- a/net/ipv6/xfrm6_output.c
+++ b/net/ipv6/xfrm6_output.c
@@ -136,6 +136,7 @@ static int __xfrm6_output(struct sock *sk, struct sk_buff 
*skb)
struct dst_entry *dst = skb_dst(skb);
struct xfrm_state *x = dst->xfrm;
int mtu;
+   bool toobig;
 
 #ifdef CONFIG_NETFILTER
if (!x) {
@@ -144,25 +145,29 @@ static int __xfrm6_output(struct sock *sk, struct sk_buff 
*skb)
}
 #endif
 
+   if (x->props.mode != XFRM_MODE_TUNNEL)
+   goto skip_frag;
+
if (skb->protocol == htons(ETH_P_IPV6))
mtu = ip6_skb_dst_mtu(skb);
else
mtu = dst_mtu(skb_dst(skb));
 
-   if (skb->len > mtu && xfrm6_local_dontfrag(skb)) {
+   toobig = skb->len > mtu && !skb_is_gso(skb);
+
+   if (toobig && xfrm6_local_dontfrag(skb)) {
xfrm6_local_rxpmtu(skb, mtu);
return -EMSGSIZE;
-   } else if (!skb->ignore_df && skb->len > mtu && skb->sk) {
+   } else if (!skb->ignore_df && toobig && skb->sk) {
xfrm_local_error(skb, mtu);
return -EMSGSIZE;
}
 
-   if (x->props.mode == XFRM_MODE_TUNNEL &&
-   ((skb->len > mtu && !skb_is_gso(skb)) ||
-   dst_allfrag(skb_dst(skb {
+   if (toobig || dst_allfrag(skb_dst(skb)))
return ip6_fragment(sk, skb,
x->outer_mode->afinfo->output_finish);
-   }
+
+skip_frag:
return x->outer_mode->afinfo->output_finish(sk, skb);
 }
 
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bug 106241] New: shutdown(3)/close(3) behaviour is incorrect for sockets in accept(3)

2015-10-22 Thread Casper . Dik

From: Al Viro 

>Except that in this case "correctness" is the matter of rather obscure and
>ill-documented areas in POSIX.  Don't get me wrong - this semantics isn't
>inherently bad, but it's nowhere near being an absolute requirement.

It would more fruitful to have such a discussion in one of the OpenGroup 
mailing lists; people gathered there have a lot of experience and it is 
also possible to fix the standard when it turns out that it indeed as 
vague as you claim it is (I don't quite agree with that)

Casper

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 1/2] can: xilinx: use readl/writel instead of ioread/iowrite

2015-10-22 Thread Appana Durga Kedareswara Rao
Hi Marc,

> -Original Message-
> From: Marc Kleine-Budde [mailto:m...@pengutronix.de]
> Sent: Thursday, October 22, 2015 1:52 PM
> To: Arnd Bergmann; linux-arm-ker...@lists.infradead.org
> Cc: Appana Durga Kedareswara Rao; Anirudha Sarangi; w...@grandegger.com;
> Michal Simek; Soren Brinkmann; Appana Durga Kedareswara Rao;
> netdev@vger.kernel.org; linux-ker...@vger.kernel.org; linux-
> c...@vger.kernel.org
> Subject: Re: [PATCH 1/2] can: xilinx: use readl/writel instead of 
> ioread/iowrite
> 
> On 10/22/2015 10:14 AM, Arnd Bergmann wrote:
> > On Thursday 22 October 2015 10:16:02 Kedareswara rao Appana wrote:
> >> The driver only supports memory-mapped I/O [by ioremap()], so
> >> readl/writel is actually the right thing to do, IMO.
> >> During the validation of this driver or IP on ARM 64-bit processor
> >> while sending lot of packets observed that the tx packet drop with
> >> iowrite Putting the barriers for each tx fifo register write fixes
> >> this issue Instead of barriers using writel also fixed this issue.
> >>
> >> Signed-off-by: Kedareswara rao Appana 
> >
> > The two should really do the same thing: iowrite32() is just a static
> > inline calling writel() on both ARM32 and ARM64. On which kernel
> > version did you observe the difference? It's possible that an older
> > version used CONFIG_GENERIC_IOMAP, which made this slightly more
> expensive.
> >
> > If there are barriers that you want to get rid of for performance
> > reasons, you should use writel_relaxed(), but be careful to
> > synchronize them correctly with regard to DMA. It should be fine in
> > this driver, as it does not perform any DMA, but be aware that there
> > is no big-endian version of
> > writel_relaxed() at the moment.
> 
> We don't have DMA in CAN drivers, but usually a certain write triggers 
> sending.
> Do we need a barrier before triggering the sending?

Yes During validation of this IP on ARM 64 bit processor with using iowrite32() 
and sending a lot of packets it requires barriers before triggering the send.
With using writel() barriers are not needed.

Regards,
Kedar.

> 
> Marc
> 
> --
> Pengutronix e.K.  | Marc Kleine-Budde   |
> Industrial Linux Solutions| Phone: +49-231-2826-924 |
> Vertretung West/Dortmund  | Fax:   +49-5121-206917- |
> Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |



[PATCH 0/3] Refactor nested mdiobus read/write functions

2015-10-22 Thread Neil Armstrong
In order to avoid locked signal false positive for nested mdiobus
read/write calls, nested code was introduced in mv88e6xxx and
mdio-mux.
But mv88e6060 also needs such nested mdiobus read/write calls.
For sake of refactoring, introduce nested variants of mdiobus read/write
and make them used by mv88e6xxx and mv88e6060.
In a next patch, mdio-mux should also use these variant calls.

Neil Armstrong (3):
  net: phy: Add nested variants of mdiobus read/write
  net: dsa: Make mv88e6xxx use nested mdiobus read/write
  net: dsa: Make mv88e6060 use nested mdiobus read/write

 drivers/net/dsa/mv88e6060.c |  4 ++--
 drivers/net/dsa/mv88e6xxx.c | 46 -
 drivers/net/phy/mdio_bus.c  | 55 +
 include/linux/phy.h |  2 ++
 4 files changed, 68 insertions(+), 39 deletions(-)

-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/3] net: phy: Add nested variants of mdiobus read/write

2015-10-22 Thread Neil Armstrong
Since nested variants of mdiobus_read/write are used in multiple
drivers, add nested variants in the mdiobus core.

Suggested-by: Andrew Lunn 
Signed-off-by: Neil Armstrong 
---
 drivers/net/phy/mdio_bus.c | 55 ++
 include/linux/phy.h|  2 ++
 2 files changed, 57 insertions(+)

diff --git a/drivers/net/phy/mdio_bus.c b/drivers/net/phy/mdio_bus.c
index 12f44c5..88cb459 100644
--- a/drivers/net/phy/mdio_bus.c
+++ b/drivers/net/phy/mdio_bus.c
@@ -372,6 +372,33 @@ struct phy_device *mdiobus_scan(struct mii_bus *bus, int 
addr)
 EXPORT_SYMBOL(mdiobus_scan);

 /**
+ * mdiobus_read_nested - Nested version of the mdiobus_read function
+ * @bus: the mii_bus struct
+ * @addr: the phy address
+ * @regnum: register number to read
+ *
+ * In case of nested MDIO bus access avoid lockdep false positives by
+ * using mutex_lock_nested().
+ *
+ * NOTE: MUST NOT be called from interrupt context,
+ * because the bus read/write functions may wait for an interrupt
+ * to conclude the operation.
+ */
+int mdiobus_read_nested(struct mii_bus *bus, int addr, u32 regnum)
+{
+   int retval;
+
+   BUG_ON(in_interrupt());
+
+   mutex_lock_nested(>mdio_lock, SINGLE_DEPTH_NESTING);
+   retval = bus->read(bus, addr, regnum);
+   mutex_unlock(>mdio_lock);
+
+   return retval;
+}
+EXPORT_SYMBOL(mdiobus_read_nested);
+
+/**
  * mdiobus_read - Convenience function for reading a given MII mgmt register
  * @bus: the mii_bus struct
  * @addr: the phy address
@@ -396,6 +423,34 @@ int mdiobus_read(struct mii_bus *bus, int addr, u32 regnum)
 EXPORT_SYMBOL(mdiobus_read);

 /**
+ * mdiobus_write_nested - Nested version of the mdiobus_write function
+ * @bus: the mii_bus struct
+ * @addr: the phy address
+ * @regnum: register number to write
+ * @val: value to write to @regnum
+ *
+ * In case of nested MDIO bus access avoid lockdep false positives by
+ * using mutex_lock_nested().
+ *
+ * NOTE: MUST NOT be called from interrupt context,
+ * because the bus read/write functions may wait for an interrupt
+ * to conclude the operation.
+ */
+int mdiobus_write_nested(struct mii_bus *bus, int addr, u32 regnum, u16 val)
+{
+   int err;
+
+   BUG_ON(in_interrupt());
+
+   mutex_lock_nested(>mdio_lock, SINGLE_DEPTH_NESTING);
+   err = bus->write(bus, addr, regnum, val);
+   mutex_unlock(>mdio_lock);
+
+   return err;
+}
+EXPORT_SYMBOL(mdiobus_write_nested);
+
+/**
  * mdiobus_write - Convenience function for writing a given MII mgmt register
  * @bus: the mii_bus struct
  * @addr: the phy address
diff --git a/include/linux/phy.h b/include/linux/phy.h
index 4c477e6..05fde31 100644
--- a/include/linux/phy.h
+++ b/include/linux/phy.h
@@ -213,7 +213,9 @@ static inline struct mii_bus *devm_mdiobus_alloc(struct 
device *dev)
 void devm_mdiobus_free(struct device *dev, struct mii_bus *bus);
 struct phy_device *mdiobus_scan(struct mii_bus *bus, int addr);
 int mdiobus_read(struct mii_bus *bus, int addr, u32 regnum);
+int mdiobus_read_nested(struct mii_bus *bus, int addr, u32 regnum);
 int mdiobus_write(struct mii_bus *bus, int addr, u32 regnum, u16 val);
+int mdiobus_write_nested(struct mii_bus *bus, int addr, u32 regnum, u16 val);


 #define PHY_INTERRUPT_DISABLED 0x0
-- 
1.9.1
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bug 106241] New: shutdown(3)/close(3) behaviour is incorrect for sockets in accept(3)

2015-10-22 Thread Al Viro
On Thu, Oct 22, 2015 at 05:44:58AM +0100, Al Viro wrote:

> Except that in this case "correctness" is the matter of rather obscure and
> ill-documented areas in POSIX.  Don't get me wrong - this semantics isn't
> inherently bad, but it's nowhere near being an absolute requirement.

PS: in principle, a fairly ugly trick might suffice for accept(2), but
I'm less than happy with going there.  Namely, we could
* have ->accept() get descriptor number
* have ->flush() get descriptor number in addition to current->files
and have it DTRT for sockets in the middle of accept(2).

However, in addition to being ugly as hell, it has the problem with the points
where we call ->flush(), specifically do_dup2() and __close_fd().
It's done *after* the replacement/removal from descriptor table, so another
socket might have already gotten the same descriptor and we'd get spurious
termination of accept(2).

And I'm really curious about the things Solaris would do with dup2() there.
Does it take into account the possibility of new accept() coming just as
dup2() is trying to terminate the ongoing ones?  Is there a window when
descriptor-to-file lookups would fail?  Looks like a race/deadlock country...
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 net-next] bpf: fix bpf_perf_event_read() helper

2015-10-22 Thread Alexei Starovoitov

On 10/21/15 10:31 PM, Wangnan (F) wrote:

+if ((attr->type != PERF_TYPE_RAW &&
+ !(attr->type == PERF_TYPE_SOFTWARE &&
+   attr->config == PERF_COUNT_SW_BPF_OUTPUT) &&
+ attr->type != PERF_TYPE_HARDWARE) ||
+attr->inherit) {


This 'if' statement is so complex. What about using a inline function
instead?


hmm. don't see how inline function will help readability.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next RFC 1/2] vhost: introduce vhost_has_work()

2015-10-22 Thread Michael S. Tsirkin
On Thu, Oct 22, 2015 at 01:27:28AM -0400, Jason Wang wrote:
> This path introduces a helper which can give a hint for whether or not
> there's a work queued in the work list.
> 
> Signed-off-by: Jason Wang 
> ---
>  drivers/vhost/vhost.c | 6 ++
>  drivers/vhost/vhost.h | 1 +
>  2 files changed, 7 insertions(+)
> 
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index eec2f11..d42d11e 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -245,6 +245,12 @@ void vhost_work_queue(struct vhost_dev *dev, struct 
> vhost_work *work)
>  }
>  EXPORT_SYMBOL_GPL(vhost_work_queue);
>  
> +bool vhost_has_work(struct vhost_dev *dev)
> +{
> + return !list_empty(>work_list);
> +}
> +EXPORT_SYMBOL_GPL(vhost_has_work);
> +
>  void vhost_poll_queue(struct vhost_poll *poll)
>  {
>   vhost_work_queue(poll->dev, >work);


This doesn't take a lock so it's unreliable.
I think it's ok in this case since it's just
an optimization - but pls document this.

> diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
> index 4772862..ea0327d 100644
> --- a/drivers/vhost/vhost.h
> +++ b/drivers/vhost/vhost.h
> @@ -37,6 +37,7 @@ struct vhost_poll {
>  
>  void vhost_work_init(struct vhost_work *work, vhost_work_fn_t fn);
>  void vhost_work_queue(struct vhost_dev *dev, struct vhost_work *work);
> +bool vhost_has_work(struct vhost_dev *dev);
>  
>  void vhost_poll_init(struct vhost_poll *poll, vhost_work_fn_t fn,
>unsigned long mask, struct vhost_dev *dev);
> -- 
> 1.8.3.1
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] can: xilinx: use readl/writel instead of ioread/iowrite

2015-10-22 Thread Arnd Bergmann
On Thursday 22 October 2015 10:21:58 Marc Kleine-Budde wrote:
> On 10/22/2015 10:14 AM, Arnd Bergmann wrote:
> > On Thursday 22 October 2015 10:16:02 Kedareswara rao Appana wrote:
> >> The driver only supports memory-mapped I/O [by ioremap()],
> >> so readl/writel is actually the right thing to do, IMO.
> >> During the validation of this driver or IP on ARM 64-bit processor
> >> while sending lot of packets observed that the tx packet drop with iowrite
> >> Putting the barriers for each tx fifo register write fixes this issue
> >> Instead of barriers using writel also fixed this issue.
> >>
> >> Signed-off-by: Kedareswara rao Appana 
> > 
> > The two should really do the same thing: iowrite32() is just a static inline
> > calling writel() on both ARM32 and ARM64. On which kernel version did you
> > observe the difference? It's possible that an older version used
> > CONFIG_GENERIC_IOMAP, which made this slightly more expensive.
> > 
> > If there are barriers that you want to get rid of for performance reasons,
> > you should use writel_relaxed(), but be careful to synchronize them 
> > correctly
> > with regard to DMA. It should be fine in this driver, as it does not
> > perform any DMA, but be aware that there is no big-endian version of
> > writel_relaxed() at the moment.
> 
> We don't have DMA in CAN drivers, but usually a certain write triggers
> sending. Do we need a barrier before triggering the sending?

No, the relaxed writes are not well-defined across architectures. On
ARM, the CPU guarantees that stores to an MMIO area are still in order
with respect to one another, the barrier is only needed for actual DMA,
so you are fine. I would expect the same to be true everywhere,
otherwise a lot of other drivers would be broken too.

To be on the safe side, that last write() could remain a writel() instead
of writel_relaxed(), and that would be guaranteed to work on all
architectures even if they end relax the ordering between MMIO writes.
If there is a measurable performance difference, just use writel_relaxed()
and add a comment.

Arnd
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 1/1] eventfd: implementation of EFD_MASK flag

2015-10-22 Thread Damian Hobson-Garcia
Hello,

> diff --git a/include/linux/eventfd.h b/include/linux/eventfd.h
> index ff0b981..87de343 100644
> --- a/include/linux/eventfd.h
> +++ b/include/linux/eventfd.h

>  
> -/*
> - * CAREFUL: Check include/uapi/asm-generic/fcntl.h when defining
> - * new flags, since they might collide with O_* ones. We want
> - * to re-use O_* flags that couldn't possibly have a meaning
> - * from eventfd, in order to leave a free define-space for
> - * shared O_* flags.
> - */
> -#define EFD_SEMAPHORE (1 << 0)
> -#define EFD_CLOEXEC O_CLOEXEC
> -#define EFD_NONBLOCK O_NONBLOCK
> -
> -#define EFD_SHARED_FCNTL_FLAGS (O_CLOEXEC | O_NONBLOCK)
> -#define EFD_FLAGS_SET (EFD_SHARED_FCNTL_FLAGS | EFD_SEMAPHORE)
> -
>  struct file;
>  
>  #ifdef CONFIG_EVENTFD
> diff --git a/include/uapi/linux/eventfd.h b/include/uapi/linux/eventfd.h
> new file mode 100644
> index 000..097dcad
> --- /dev/null
> +++ b/include/uapi/linux/eventfd.h
> @@ -0,0 +1,33 @@

> +
> +/*
> + * CAREFUL: Check include/asm-generic/fcntl.h when defining
> + * new flags, since they might collide with O_* ones. We want
> + * to re-use O_* flags that couldn't possibly have a meaning
> + * from eventfd, in order to leave a free define-space for
> + * shared O_* flags.
> + */
> +
> +/* Provide semaphore-like semantics for reads from the eventfd. */
> +#define EFD_SEMAPHORE (1 << 0)
> +/* Provide event mask semantics for the eventfd. */
> +#define EFD_MASK (1 << 1)
> +/*  Set the close-on-exec (FD_CLOEXEC) flag on the eventfd. */
> +#define EFD_CLOEXEC O_CLOEXEC
> +/*  Create the eventfd in non-blocking mode. */
> +#define EFD_NONBLOCK O_NONBLOCK
> +#endif /* _UAPI_LINUX_EVENTFD_H */
> 

Since the latest version of this patch adds only the EFD_MASK definition
to the eventfd header, I was wondering if it was really
necessary/recommended to move the definitions from linux/eventfd.h to
linux/uapi/eventfd.h.  From my understanding, the EFD_SEMAPHORE (and now
EFD_MASK) define(s) are provided to user space from the libc headers
only. Any advice would be greatly appreciated.

Thank you,
Damian
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V5 1/1] bpf: control events stored in PERF_EVENT_ARRAY maps trace data output when perf sampling

2015-10-22 Thread Ingo Molnar

* Wangnan (F)  wrote:

> 
> 
> On 2015/10/22 0:57, Peter Zijlstra wrote:
> >On Wed, Oct 21, 2015 at 11:06:47PM +0800, pi3orama wrote:
> >>>So explain; how does this eBPF stuff work.
> >>I think I get your point this time, and let me explain the eBPF stuff to 
> >>you.
> >>
> >>You are aware that BPF programmer can break the system in this way:
> >>
> >>A=get_non_local_perf_event()
> >>perf_event_read_local(A)
> >>BOOM!
> >>
> >>However the above logic is impossible because BPF program can't work this
> >>way.
> >>
> >>First of all, it is impossible for a BPF program directly invoke a
> >>kernel function.  Doesn't like kernel module, BPF program can only
> >>invoke functions designed for them, like what this patch does. So the
> >>ability of BPF programs is strictly restricted by kernel. If we don't
> >>allow BPF program call perf_event_read_local() across core, we can
> >>check this and return error in function we provide for them.
> >>
> >>Second: there's no way for a BPF program directly access a perf event.
> >>All perf events have to be wrapped by a map and be accessed by BPF
> >>functions described above. We don't allow BPF program fetch array
> >>element from that map. So pointers of perf event is safely protected
> >>from BPF program.
> >>
> >>In summary, your either-or logic doesn't hold in BPF world. A BPF
> >>program can only access perf event in a highly restricted way. We
> >>don't allow it calling perf_event_read_local() across core, so it
> >>can't.
> >Urgh, that's still horridly inconsistent. Can we please come up with a
> >consistent interface to perf?
> 
> BPF program and kernel module are two different worlds as I said before.
> 
> I don't think making them to share a common interface is a good idea because 
> such sharing will give BPF programs too much freedom than it really need, 
> then 
> it will be hard prevent them to do something bad. If we really need kernel 
> interface, I think what we need is kernel module, not BPF program.

What do you mean, as this does not parse for me.

We obviously can (and very likely should) make certain perf functionality 
available to BPF programs.

It should still be a well defined yet flexible iterface, with safe behavior, 
obviously - all in line with existing BPF sandboxing principles.

'Kernel modules' don't enter this consideration at all, not sure why you 
mention 
them - all this functionality is also available if CONFIG_MODULES is turned off 
completely.

Thanks,

Ingo

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 net-next] bpf: fix bpf_perf_event_read() helper

2015-10-22 Thread Wangnan (F)



On 2015/10/22 14:21, Alexei Starovoitov wrote:

On 10/21/15 10:31 PM, Wangnan (F) wrote:

+if ((attr->type != PERF_TYPE_RAW &&
+ !(attr->type == PERF_TYPE_SOFTWARE &&
+   attr->config == PERF_COUNT_SW_BPF_OUTPUT) &&
+ attr->type != PERF_TYPE_HARDWARE) ||
+attr->inherit) {


This 'if' statement is so complex. What about using a inline function
instead?


hmm. don't see how inline function will help readability.



For example (not tested):

 static inline bool perf_event_can_insert_to_map(struct perf_event_attr 
*attr)

 {
/* is inherit? */
if (attr->inherit)
return false;

/* is software event? */
if (attr->type == PERF_TYPE_SOFTWARE)
if (attr->config == PERF_COUNT_SW_BPF_OUTPUT)
return true;
else
return false;

/* Comment... */
if (attr->type == PERF_TYPE_RAW)
return true;
if (attr->type == PERF_TYPE_HARDWARE)
return true;
return false;
 }

 ...
 if (!perf_event_can_insert_to_map(attr))


Do you think redability is improved?

Thank you.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] can: xilinx: use readl/writel instead of ioread/iowrite

2015-10-22 Thread Arnd Bergmann
On Thursday 22 October 2015 10:16:02 Kedareswara rao Appana wrote:
> The driver only supports memory-mapped I/O [by ioremap()],
> so readl/writel is actually the right thing to do, IMO.
> During the validation of this driver or IP on ARM 64-bit processor
> while sending lot of packets observed that the tx packet drop with iowrite
> Putting the barriers for each tx fifo register write fixes this issue
> Instead of barriers using writel also fixed this issue.
> 
> Signed-off-by: Kedareswara rao Appana 

The two should really do the same thing: iowrite32() is just a static inline
calling writel() on both ARM32 and ARM64. On which kernel version did you
observe the difference? It's possible that an older version used
CONFIG_GENERIC_IOMAP, which made this slightly more expensive.

If there are barriers that you want to get rid of for performance reasons,
you should use writel_relaxed(), but be careful to synchronize them correctly
with regard to DMA. It should be fine in this driver, as it does not
perform any DMA, but be aware that there is no big-endian version of
writel_relaxed() at the moment.

Arnd
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bug 106241] New: shutdown(3)/close(3) behaviour is incorrect for sockets in accept(3)

2015-10-22 Thread Casper . Dik

>It's been said that the current mechanisms in Linux & some BSD variants 
>can be subject to races, and the behaviour exhibited doesn't conform to 
>POSIX, for example requiring the use of shutdown() on unconnected 
>sockets because close() doesn't kick off other threads accept()ing on 
>the same fd. I'd be interested to hear if there's a better and more 
>performant way of handling the situation that doesn't involve doing the 
>sort of bookkeeping Casper described,.

Of course, the implementation is now around 18 years old; clearly a lot of 
things have changed since then.

In the particular case of Linux close() on a socket, surely it must be 
possible to detect at close that it is a listening socket and that you are 
about to close the last reference; the kernel could then do the shutdown() 
all by itself.

Casper

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] net: dsa: mv88e6060: Fix false positive lockdep splat

2015-10-22 Thread Neil Armstrong
On 10/21/2015 06:14 PM, Andrew Lunn wrote:
> On Wed, Oct 21, 2015 at 05:37:45PM +0200, Neil Armstrong wrote:
>> Like the change made for mv88e6xxx, use mutex_lock_nested() to avoid
>> lockdep to give false positives because of nested MDIO busses.
> 
> Hi Neil
> 
> We now have three instances of this, since mdio-mux.c has the same
> code. Maybe now would be a good time to refactor this code into
> mdiobus_read_nested() and mdiobus_write_nested() in mdio_bus.c?  At
> the same time, add BUG_ON(in_interrupt()) similar to the non-nested
> versions?
> 
>   Andrew
> 
Well, mdio-mux also calls switch_fn inside the mdio_lock, clean refactoring
would introduce a separate lock and call the nested variants.
Is that ok ? Can someone test mdio-mux is I make the change ?


Neil
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] net: dsa: mv88e6060: Fix false positive lockdep splat

2015-10-22 Thread Neil Armstrong
On 10/21/2015 06:14 PM, Andrew Lunn wrote:
> On Wed, Oct 21, 2015 at 05:37:45PM +0200, Neil Armstrong wrote:
>> Like the change made for mv88e6xxx, use mutex_lock_nested() to avoid
>> lockdep to give false positives because of nested MDIO busses.
> 
> Hi Neil
> 
> We now have three instances of this, since mdio-mux.c has the same
> code. Maybe now would be a good time to refactor this code into
> mdiobus_read_nested() and mdiobus_write_nested() in mdio_bus.c?  At
> the same time, add BUG_ON(in_interrupt()) similar to the non-nested
> versions?
> 
>   Andrew
> 
Well, mdio-mux also calls switch_fn inside the mdio_lock, clean refactoring
would introduce a separate lock and call the nested variants.
Is that ok ? Can someone test mdio-mux if I make the change ?


Neil
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 1/2] can: xilinx: use readl/writel instead of ioread/iowrite

2015-10-22 Thread Appana Durga Kedareswara Rao
Hi Arnd,

> -Original Message-
> From: Arnd Bergmann [mailto:a...@arndb.de]
> Sent: Thursday, October 22, 2015 1:45 PM
> To: linux-arm-ker...@lists.infradead.org
> Cc: Appana Durga Kedareswara Rao; Anirudha Sarangi; w...@grandegger.com;
> m...@pengutronix.de; Michal Simek; Soren Brinkmann; Appana Durga
> Kedareswara Rao; netdev@vger.kernel.org; linux-ker...@vger.kernel.org; linux-
> c...@vger.kernel.org
> Subject: Re: [PATCH 1/2] can: xilinx: use readl/writel instead of 
> ioread/iowrite
> 
> On Thursday 22 October 2015 10:16:02 Kedareswara rao Appana wrote:
> > The driver only supports memory-mapped I/O [by ioremap()], so
> > readl/writel is actually the right thing to do, IMO.
> > During the validation of this driver or IP on ARM 64-bit processor
> > while sending lot of packets observed that the tx packet drop with
> > iowrite Putting the barriers for each tx fifo register write fixes
> > this issue Instead of barriers using writel also fixed this issue.
> >
> > Signed-off-by: Kedareswara rao Appana 
> 
> The two should really do the same thing: iowrite32() is just a static inline 
> calling
> writel() on both ARM32 and ARM64. On which kernel version did you observe the
> difference? It's possible that an older version used CONFIG_GENERIC_IOMAP,
> which made this slightly more expensive.

I observed this issue with the 4.0.0 kernel version

> 
> If there are barriers that you want to get rid of for performance reasons, you
> should use writel_relaxed(), but be careful to synchronize them correctly with
> regard to DMA. It should be fine in this driver, as it does not perform any 
> DMA,
> but be aware that there is no big-endian version of
> writel_relaxed() at the moment.

There is no DMA in CAN for this IP.

Regards,
Kedar.

> 
>   Arnd
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH iproute2] ip-address: properly display zero IPv4 peer address

2015-10-22 Thread Thomas Haller
Kernel allows for zero IPv4 peer addresses (IFA_ADDRESS):

   ip address add 192.168.5.1 peer 0.0.0.0/24 dev dummy

which is distinct from a usual address like:

   ip address add 192.168.5.1/24 dev dummy
   ip address add 192.168.5.1 peer 192.168.5.1/24 dev dummy

For IPv4, a missing IFA_ADDRESS attribute means that the peer
is 0.0.0.0. See inet_fill_ifaddr(), which does:

  if ((ifa->ifa_address &&
   nla_put_in_addr(skb, IFA_ADDRESS, ifa->ifa_address)) ||

Signed-off-by: Thomas Haller 
---
 ip/ipaddress.c | 15 +--
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/ip/ipaddress.c b/ip/ipaddress.c
index f290205..6fc4520 100644
--- a/ip/ipaddress.c
+++ b/ip/ipaddress.c
@@ -949,7 +949,8 @@ int print_addrinfo(const struct sockaddr_nl *who, struct 
nlmsghdr *n,
 
if (!rta_tb[IFA_LOCAL])
rta_tb[IFA_LOCAL] = rta_tb[IFA_ADDRESS];
-   if (!rta_tb[IFA_ADDRESS])
+   if (!rta_tb[IFA_ADDRESS] &&
+   ifa->ifa_family != AF_INET)
rta_tb[IFA_ADDRESS] = rta_tb[IFA_LOCAL];
 
if (filter.ifindex && filter.ifindex != ifa->ifa_index)
@@ -1034,16 +1035,18 @@ int print_addrinfo(const struct sockaddr_nl *who, 
struct nlmsghdr *n,
RTA_DATA(rta_tb[IFA_LOCAL]),
abuf, sizeof(abuf)));
 
-   if (rta_tb[IFA_ADDRESS] == NULL ||
+   if (rta_tb[IFA_ADDRESS] != NULL &&
memcmp(RTA_DATA(rta_tb[IFA_ADDRESS]), 
RTA_DATA(rta_tb[IFA_LOCAL]),
   ifa->ifa_family == AF_INET ? 4 : 16) == 0) {
fprintf(fp, "/%d ", ifa->ifa_prefixlen);
} else {
fprintf(fp, " peer %s/%d ",
-   format_host(ifa->ifa_family,
-   RTA_PAYLOAD(rta_tb[IFA_ADDRESS]),
-   RTA_DATA(rta_tb[IFA_ADDRESS]),
-   abuf, sizeof(abuf)),
+   rta_tb[IFA_ADDRESS]
+   ? format_host(ifa->ifa_family,
+ 
RTA_PAYLOAD(rta_tb[IFA_ADDRESS]),
+ 
RTA_DATA(rta_tb[IFA_ADDRESS]),
+ abuf, sizeof(abuf))
+   : "0.0.0.0",
ifa->ifa_prefixlen);
}
}
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 05/15] net: wireless: ti: Return flow can be simplified for wl1271_cmd_interrogate

2015-10-22 Thread Eliad Peller
On Wed, Oct 21, 2015 at 10:07 PM, Punit Vara  wrote:
> Remove int ret suggested by kbuild test robot
>
> This patch is to the wlcore/acx.c file that fixes up warning
> reported by coccicheck:
>
> WARNING: end returns can be simplified if negative or 0 value
>
> Prefer direct return value instead of writing 2-3 more sentence.
>
> Signed-off-by: Punit Vara 
> ---
>  drivers/net/wireless/ti/wlcore/acx.c | 7 +--
>  1 file changed, 1 insertion(+), 6 deletions(-)
>
> diff --git a/drivers/net/wireless/ti/wlcore/acx.c 
> b/drivers/net/wireless/ti/wlcore/acx.c
> index f28fa3b..6b566d9 100644
> --- a/drivers/net/wireless/ti/wlcore/acx.c
> +++ b/drivers/net/wireless/ti/wlcore/acx.c
> @@ -158,16 +158,11 @@ out:
>  int wl1271_acx_mem_map(struct wl1271 *wl, struct acx_header *mem_map,
>size_t len)
>  {
> -   int ret;
>
> wl1271_debug(DEBUG_ACX, "acx mem map");
>
> -   ret = wl1271_cmd_interrogate(wl, ACX_MEM_MAP, mem_map,
> +   return wl1271_cmd_interrogate(wl, ACX_MEM_MAP, mem_map,
>  sizeof(struct acx_header), len);
> -   if (ret < 0)
> -   return ret;
> -
> -   return 0;
>  }
>
this changes the return value in case of positive values.
have you verified it can't happen / won't affect the code flow?
i'm not sure you really want to blindly patch it...

Eliad.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] net: dsa: mv88e6060: Fix false positive lockdep splat

2015-10-22 Thread Neil Armstrong
Hi Andrew,

On 10/21/2015 06:14 PM, Andrew Lunn wrote:
> On Wed, Oct 21, 2015 at 05:37:45PM +0200, Neil Armstrong wrote:
>> Like the change made for mv88e6xxx, use mutex_lock_nested() to avoid
>> lockdep to give false positives because of nested MDIO busses.
> 
> Hi Neil
> 
> We now have three instances of this, since mdio-mux.c has the same
> code. Maybe now would be a good time to refactor this code into
> mdiobus_read_nested() and mdiobus_write_nested() in mdio_bus.c?  At
> the same time, add BUG_ON(in_interrupt()) similar to the non-nested
> versions?
> 
>   Andrew
> 
Indeed, you are right, I will post a serie with this refactoring.

Neil
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V5 1/1] bpf: control events stored in PERF_EVENT_ARRAY maps trace data output when perf sampling

2015-10-22 Thread Wangnan (F)



On 2015/10/22 15:39, Ingo Molnar wrote:

* Wangnan (F)  wrote:


[SNIP]


In summary, your either-or logic doesn't hold in BPF world. A BPF
program can only access perf event in a highly restricted way. We
don't allow it calling perf_event_read_local() across core, so it
can't.

Urgh, that's still horridly inconsistent. Can we please come up with a
consistent interface to perf?

BPF program and kernel module are two different worlds as I said before.

I don't think making them to share a common interface is a good idea because
such sharing will give BPF programs too much freedom than it really need, then
it will be hard prevent them to do something bad. If we really need kernel
interface, I think what we need is kernel module, not BPF program.

What do you mean, as this does not parse for me.


Because I'm not very sure what the meaning of "inconsistent" in
Peter's words...

I think what Peter want us to do is to provide similar (consistent) 
interface

between kernel and eBPF that, if kernel reads from a perf_event through
perf_event_read_local(struct perf_event *), BPF program should
do this work with similar code, or at least similar logic, so
we need to create handler for a perf event, and provide a BPF function
called BPF_FUNC_perf_event_read_local then pass such handler to it.

I don't think like this because if we want kernel interface we'd
better use kernel module, not eBPF so I mentioned kernel module here.

Ingo, do you think BPF inerface should be *consistent* with anything?

Thank you.


We obviously can (and very likely should) make certain perf functionality
available to BPF programs.

It should still be a well defined yet flexible iterface, with safe behavior,
obviously - all in line with existing BPF sandboxing principles.

'Kernel modules' don't enter this consideration at all, not sure why you mention
them - all this functionality is also available if CONFIG_MODULES is turned off
completely.

Thanks,

Ingo




--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 1/2] net: mvneta: add ethtool statistics

2015-10-22 Thread Russell King
Add support for the ethtool statistic interface, returning the full set
of statistics which both Armada 370, 38x and Armada XP can support.

Tested-by: Andrew Lunn 
Signed-off-by: Russell King 
---
 drivers/net/ethernet/marvell/mvneta.c | 108 ++
 1 file changed, 108 insertions(+)

diff --git a/drivers/net/ethernet/marvell/mvneta.c 
b/drivers/net/ethernet/marvell/mvneta.c
index 514df76fc70f..69d80ca4fba2 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -277,6 +277,50 @@
 
 #define MVNETA_RX_BUF_SIZE(pkt_size)   ((pkt_size) + NET_SKB_PAD)
 
+struct mvneta_statistic {
+   unsigned short offset;
+   unsigned short type;
+   const char name[ETH_GSTRING_LEN];
+};
+
+#define T_REG_32   32
+#define T_REG_64   64
+
+static const struct mvneta_statistic mvneta_statistics[] = {
+   { 0x3000, T_REG_64, "good_octets_received", },
+   { 0x3010, T_REG_32, "good_frames_received", },
+   { 0x3008, T_REG_32, "bad_octets_received", },
+   { 0x3014, T_REG_32, "bad_frames_received", },
+   { 0x3018, T_REG_32, "broadcast_frames_received", },
+   { 0x301c, T_REG_32, "multicast_frames_received", },
+   { 0x3050, T_REG_32, "unrec_mac_control_received", },
+   { 0x3058, T_REG_32, "good_fc_received", },
+   { 0x305c, T_REG_32, "bad_fc_received", },
+   { 0x3060, T_REG_32, "undersize_received", },
+   { 0x3064, T_REG_32, "fragments_received", },
+   { 0x3068, T_REG_32, "oversize_received", },
+   { 0x306c, T_REG_32, "jabber_received", },
+   { 0x3070, T_REG_32, "mac_receive_error", },
+   { 0x3074, T_REG_32, "bad_crc_event", },
+   { 0x3078, T_REG_32, "collision", },
+   { 0x307c, T_REG_32, "late_collision", },
+   { 0x2484, T_REG_32, "rx_discard", },
+   { 0x2488, T_REG_32, "rx_overrun", },
+   { 0x3020, T_REG_32, "frames_64_octets", },
+   { 0x3024, T_REG_32, "frames_65_to_127_octets", },
+   { 0x3028, T_REG_32, "frames_128_to_255_octets", },
+   { 0x302c, T_REG_32, "frames_256_to_511_octets", },
+   { 0x3030, T_REG_32, "frames_512_to_1023_octets", },
+   { 0x3034, T_REG_32, "frames_1024_to_max_octets", },
+   { 0x3038, T_REG_64, "good_octets_sent", },
+   { 0x3040, T_REG_32, "good_frames_sent", },
+   { 0x3044, T_REG_32, "excessive_collision", },
+   { 0x3048, T_REG_32, "multicast_frames_sent", },
+   { 0x304c, T_REG_32, "broadcast_frames_sent", },
+   { 0x3054, T_REG_32, "fc_sent", },
+   { 0x300c, T_REG_32, "internal_mac_transmit_err", },
+};
+
 struct mvneta_pcpu_stats {
struct  u64_stats_sync syncp;
u64 rx_packets;
@@ -312,6 +356,8 @@ struct mvneta_port {
unsigned int speed;
unsigned int tx_csum_limit;
int use_inband_status:1;
+
+   u64 ethtool_stats[ARRAY_SIZE(mvneta_statistics)];
 };
 
 /* The mvneta_tx_desc and mvneta_rx_desc structures describe the
@@ -2875,6 +2921,65 @@ static int mvneta_ethtool_set_ringparam(struct 
net_device *dev,
return 0;
 }
 
+static void mvneta_ethtool_get_strings(struct net_device *netdev, u32 sset,
+  u8 *data)
+{
+   if (sset == ETH_SS_STATS) {
+   int i;
+
+   for (i = 0; i < ARRAY_SIZE(mvneta_statistics); i++)
+   memcpy(data + i * ETH_GSTRING_LEN,
+  mvneta_statistics[i].name, ETH_GSTRING_LEN);
+   }
+}
+
+static void mvneta_ethtool_update_stats(struct mvneta_port *pp)
+{
+   const struct mvneta_statistic *s;
+   void __iomem *base = pp->base;
+   u32 high, low, val;
+   int i;
+
+   for (i = 0, s = mvneta_statistics;
+s < mvneta_statistics + ARRAY_SIZE(mvneta_statistics);
+s++, i++) {
+   val = 0;
+
+   switch (s->type) {
+   case T_REG_32:
+   val = readl_relaxed(base + s->offset);
+   break;
+   case T_REG_64:
+   /* Docs say to read low 32-bit then high */
+   low = readl_relaxed(base + s->offset);
+   high = readl_relaxed(base + s->offset + 4);
+   val = (u64)high << 32 | low;
+   break;
+   }
+
+   pp->ethtool_stats[i] += val;
+   }
+}
+
+static void mvneta_ethtool_get_stats(struct net_device *dev,
+struct ethtool_stats *stats, u64 *data)
+{
+   struct mvneta_port *pp = netdev_priv(dev);
+   int i;
+
+   mvneta_ethtool_update_stats(pp);
+
+   for (i = 0; i < ARRAY_SIZE(mvneta_statistics); i++)
+   *data++ = pp->ethtool_stats[i];
+}
+
+static int mvneta_ethtool_get_sset_count(struct net_device *dev, int sset)
+{
+   if (sset == ETH_SS_STATS)
+   return ARRAY_SIZE(mvneta_statistics);
+   return -EOPNOTSUPP;
+}
+
 

[PATCH v3 0/2] mvneta ethtool statistics

2015-10-22 Thread Russell King - ARM Linux
Sorry for v3 - I forgot to update the commit message on patch 1 as
requested by Marcin.

This short series adds ethtool statistics reporting to mvneta.  Having
discussed with Andrew on IRC, we decided I'd pick up his patch into my
series.

My change for patch 1 compared to the previous RFC splits out the
reading of the statistics from the hardware into a separate function,
in order to facilitate work going on elsewhere to arrange for the
statistics to be preserved across a suspend/resume cycle.

 drivers/net/ethernet/marvell/mvneta.c | 117 +-
 1 file changed, 115 insertions(+), 2 deletions(-)

-- 
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 2/2] net: mvneta: Fix clearing of MIB statistics

2015-10-22 Thread Russell King
From: Andrew Lunn 

The existing function to clear the MIB statatistics was using the
wrong address for the registers. Also, the counters would of been
cleared when the interface was brought up, not during the
probe. Fix both of these.

Signed-off-by: Andrew Lunn 
Signed-off-by: Russell King 
---
 drivers/net/ethernet/marvell/mvneta.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/marvell/mvneta.c 
b/drivers/net/ethernet/marvell/mvneta.c
index 69d80ca4fba2..8829c481fe1b 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -100,6 +100,8 @@
 #define MVNETA_TXQ_CMD   0x2448
 #define  MVNETA_TXQ_DISABLE_SHIFT8
 #define  MVNETA_TXQ_ENABLE_MASK  0x00ff
+#define MVNETA_RX_DISCARD_FRAME_COUNT   0x2484
+#define MVNETA_OVERRUN_FRAME_COUNT  0x2488
 #define MVNETA_GMAC_CLOCK_DIVIDER0x24f4
 #define  MVNETA_GMAC_1MS_CLOCK_ENABLEBIT(31)
 #define MVNETA_ACC_MODE  0x2500
@@ -191,7 +193,7 @@
 #define  MVNETA_GMAC_AN_FLOW_CTRL_EN BIT(11)
 #define  MVNETA_GMAC_CONFIG_FULL_DUPLEX  BIT(12)
 #define  MVNETA_GMAC_AN_DUPLEX_ENBIT(13)
-#define MVNETA_MIB_COUNTERS_BASE 0x3080
+#define MVNETA_MIB_COUNTERS_BASE 0x3000
 #define  MVNETA_MIB_LATE_COLLISION   0x7c
 #define MVNETA_DA_FILT_SPEC_MCAST0x3400
 #define MVNETA_DA_FILT_OTH_MCAST 0x3500
@@ -564,6 +566,8 @@ static void mvneta_mib_counters_clear(struct mvneta_port 
*pp)
/* Perform dummy reads from MIB counters */
for (i = 0; i < MVNETA_MIB_LATE_COLLISION; i += 4)
dummy = mvreg_read(pp, (MVNETA_MIB_COUNTERS_BASE + i));
+   dummy = mvreg_read(pp, MVNETA_RX_DISCARD_FRAME_COUNT);
+   dummy = mvreg_read(pp, MVNETA_OVERRUN_FRAME_COUNT);
 }
 
 /* Get System Network Statistics */
@@ -792,7 +796,6 @@ static void mvneta_port_up(struct mvneta_port *pp)
u32 q_map;
 
/* Enable all initialized TXs. */
-   mvneta_mib_counters_clear(pp);
q_map = 0;
for (queue = 0; queue < txq_number; queue++) {
struct mvneta_tx_queue *txq = >txqs[queue];
@@ -1076,6 +1079,8 @@ static void mvneta_defaults_set(struct mvneta_port *pp)
mvreg_write(pp, MVNETA_INTR_ENABLE,
(MVNETA_RXQ_INTR_ENABLE_ALL_MASK
 | MVNETA_TXQ_INTR_ENABLE_ALL_MASK));
+
+   mvneta_mib_counters_clear(pp);
 }
 
 /* Set max sizes for tx queues */
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


XFRM Policy lookup in flowcache

2015-10-22 Thread Naji M Abdulla
Hi All,

I have a question regarding speeding up xfrm policy lookup in flow cache.

We are working on a product where  3 policies (in,fwd and out) are
setup per tunnel.

Tunnels comes up and down all the time, so the policy data base will
be keep changing.

Looks like there is an impact in performance since every time policy
database is changed, policies in flow cache gets invalidated. This
eventually results in almost all lookups going to policy database.

Is there any mechanism to get around this to get faster policy lookup?

thanks

--naji
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 7/8] mm: vmscan: report vmpressure at the level of reclaim activity

2015-10-22 Thread Vladimir Davydov
On Thu, Oct 22, 2015 at 12:21:35AM -0400, Johannes Weiner wrote:
...
> @@ -2437,6 +2439,10 @@ static bool shrink_zone(struct zone *zone, struct 
> scan_control *sc,
>   }
>   }
>  
> + vmpressure(sc->gfp_mask, memcg,
> +sc->nr_scanned - scanned,
> +sc->nr_reclaimed - reclaimed);
> +
>   /*
>* Direct reclaim and kswapd have to scan all memory
>* cgroups to fulfill the overall scan target for the
> @@ -2454,10 +2460,6 @@ static bool shrink_zone(struct zone *zone, struct 
> scan_control *sc,
>   }
>   } while ((memcg = mem_cgroup_iter(root, memcg, )));
>  
> - vmpressure(sc->gfp_mask, sc->target_mem_cgroup,
> -sc->nr_scanned - nr_scanned,
> -sc->nr_reclaimed - nr_reclaimed);
> -
>   if (sc->nr_reclaimed - nr_reclaimed)
>   reclaimable = true;
>  

I may be mistaken, but AFAIU this patch subtly changes the behavior of
vmpressure visible from the userspace: w/o this patch a userspace
process will only receive a notification for a memory cgroup only if
*this* memory cgroup calls reclaimer; with this patch userspace
notification will be issued even if reclaimer is invoked by any cgroup
up the hierarchy.

Thanks,
Vladimir
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 14/15] net: wireless: ath: Remove unneeded variable ret returning 0

2015-10-22 Thread Punit Vara
Remove black line suggested by Sergei

This patch is to the ath5k/eeprom.c that fixes up warning caught by
coccicheck:

Unneeded variable: "ret". Return "0" on line 980

Remove unneeded variable ret created to return zero.

Signed-off-by: Punit Vara 
---
 drivers/net/wireless/ath/wcn36xx/main.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/net/wireless/ath/wcn36xx/main.c 
b/drivers/net/wireless/ath/wcn36xx/main.c
index 900e72a..94bcc08 100644
--- a/drivers/net/wireless/ath/wcn36xx/main.c
+++ b/drivers/net/wireless/ath/wcn36xx/main.c
@@ -935,8 +935,6 @@ static const struct ieee80211_ops wcn36xx_ops = {
 
 static int wcn36xx_init_ieee80211(struct wcn36xx *wcn)
 {
-   int ret = 0;
-
static const u32 cipher_suites[] = {
WLAN_CIPHER_SUITE_WEP40,
WLAN_CIPHER_SUITE_WEP104,
@@ -977,7 +975,7 @@ static int wcn36xx_init_ieee80211(struct wcn36xx *wcn)
wcn->hw->sta_data_size = sizeof(struct wcn36xx_sta);
wcn->hw->vif_data_size = sizeof(struct wcn36xx_vif);
 
-   return ret;
+   return 0;
 }
 
 static int wcn36xx_platform_get_resources(struct wcn36xx *wcn,
-- 
2.5.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/8] mm: memcontrol: account socket memory on unified hierarchy

2015-10-22 Thread Vladimir Davydov
On Thu, Oct 22, 2015 at 12:21:33AM -0400, Johannes Weiner wrote:
...
> @@ -5500,13 +5524,38 @@ void sock_release_memcg(struct sock *sk)
>   */
>  bool mem_cgroup_charge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages)
>  {
> + unsigned int batch = max(CHARGE_BATCH, nr_pages);
>   struct page_counter *counter;
> + bool force = false;
>  
> - if (page_counter_try_charge(>skmem, nr_pages, ))
> + if (!cgroup_subsys_on_dfl(memory_cgrp_subsys)) {
> + if (page_counter_try_charge(>skmem, nr_pages, ))
> + return true;
> + page_counter_charge(>skmem, nr_pages);
> + return false;
> + }
> +
> + if (consume_stock(memcg, nr_pages))
>   return true;
> +retry:
> + if (page_counter_try_charge(>memory, batch, ))
> + goto done;

Currently, we use memcg->memory only for charging memory pages. Besides,
every page charged to this counter (including kmem) has ->mem_cgroup
field set appropriately. This looks consistent and nice. As an extra
benefit, we can track all pages charged to a memory cgroup via
/proc/kapgecgroup.

Now, you charge "window size" to it, which AFAIU isn't necessarily equal
to the amount of memory actually consumed by the cgroup for socket
buffers. I think this looks ugly and inconsistent with the existing
behavior. I agree that we need to charge socker buffers to ->memory, but
IMO we should do that per each skb page, using memcg_kmem_charge_kmem
somewhere in alloc_skb_with_frags invoking the reclaimer just as we do
for kmalloc, while tcp window size control should stay aside.

Thanks,
Vladimir
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] net: ethernet: add driver for Aurora VLSI NB8800 Ethernet controller

2015-10-22 Thread Måns Rullgård
David Miller  writes:

> From: Mans Rullgard 
> Date: Thu, 22 Oct 2015 15:02:38 +0100
>
>> This adds a driver for the Aurora VLSI NB8800 Ethernet controller.
>> It is an almost complete rewrite of a driver originally found in
>> a Sigma Designs 2.6.22 tree.
>> 
>> Signed-off-by: Mans Rullgard 
>
> The netdev list is going to have to see patches #1 and #2 as well as
> your cover letter in order to review and integrate this driver
> properly.

Patch 1 only added the "aurora" DT vendor prefix.  Patch 2 was the DT
binding for this device.  I'll include netdev when I send an updated
version.

-- 
Måns Rullgård
m...@mansr.com
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bug 106241] New: shutdown(3)/close(3) behaviour is incorrect for sockets in accept(3)

2015-10-22 Thread Alan Burlison

On 22/10/2015 18:05, Al Viro wrote:


Oh, for...  Right in this thread an example of complete BS has been quoted
from POSIX close(2).  The part about closing a file when the last descriptor
gets closed.  _Nothing_ is POSIX-compliant in that respect (nor should
it be).


That's not exactly what it says, we've already discussed, for example in 
the case of pending async IO on a filehandle.



Semantics around the distinction between file descriptors and
 file descriptions is underspecified, not to mention being very poorly
written.


I agree that part could do with some polishing.


You want to add something along the lines of "if any action by another thread
changes the mapping from file descriptors to file descriptions for any
file descriptor passed to syscall, such and such things should happen" - go
ahead and specify what should happen.  As it is, I don't see anything of
that sort in e.g. accept(2).  And no,
[EBADF]
The socket argument is not a valid file descriptor.
in there is nowhere near being unambiguous enough - everyone agrees that
argument should be a valid descriptor at the time of call, but I would be
very surprised to find _any_ implementation (including Solaris one)
recheck that upon exit to userland.


The scenario I described previously, where dup2() is used to modify a fd 
that's being used in accept, result in the accept call terminating in 
the same way as if close had been called on it - Casper gave details 
earlier.



For more bullshit from the same source (issue 7, close(2)):
If fildes refers to a socket, close() shall cause the socket to be
destroyed. If the socket is in connection-mode, and the SO_LINGER
option is set for the socket with non-zero linger time, and the socket
has untransmitted data, then close() shall block for up to the current
linger interval until all data is transmitted.
I challenge you to find *any* implementation that would have
fd = socket(...);
close(dup(fd));
do what this wonder of technical prose clearly requests.  In the same text we
also have
When all file descriptors associated with a pipe or FIFO special file
are closed, any data remaining in the pipe or FIFO shall be discarded.
as well as explicit (and underspecified, but perhaps they do it elsewhere)
"last close" in parts related to sockets and ptys.


Yes, Casper has just reported that to TOG, see 
http://thread.gmane.org/gmane.comp.standards.posix.austin.general/11573. 
Our assessment is that sockets should behave the same way as plain 
files, i.e. 'last close'.



And that is not to mention the dup2(2) wording in there:
If fildes2 is already a valid open file descriptor, it shall be
closed first
which is (a) inviting misinterpretation that would make the damn thing
non-atomic (the only mentioning of atomicity is in non-normative sections)


I've already explained why I believe atomic behaviour of dup2() is 
required by POSIX. If you feel it's not clear then we can ask POSIX for 
clarification.



and (b) says fsck-all about the effects of closing descriptor.  The latter
is a problem, since nothing in close(2) bothers making a distinction between
the effects specific to particular syscall and those common to all ways of
closing a descriptor.  And no, it's not a nitpicking - consider e.g. the
parts concerning the order of events triggered by close(2) (such and such
should be completed before close(2) returns); should it be taken as "same
events should be completed before newfd is associated with the file description
refered to by oldfd"?  It _is_ user-visible, since close(2) removes fcntl
locks.  Sure, there is (otherwise unexplained)
The dup2() function is not intended for use in critical regions
as a synchronization mechanism.
down in informative sections, so one can infer that event order here isn't
to be relied upon.  With no way to guess whether the event order concerning
e.g. effect on ongoing accept(newfd) is any different in that respect.


I think "it shall be closed first" makes it pretty clear that what is 
expected is the same behaviour as any direct invocation of close, and 
that has to happen before the reassignment. What makes you believe 
that's isn't the case?



The entire area in Issue 7 stinks.  It might make sense to try and fix it
up, but let's not pretend that what's in there right now does specify the
semantics in this kind of situations.


Sorry, I disagree.


I'm not saying that Solaris approach yields an inherently bad semantics or
that it's impossible to implement without high scalability price and/or
high memory footprint.  But waving the flag of POSIX compliance when you
are actually talking about the ways your implementation plugs the holes in
a badly incomplete spec...


Personally I believe the spec is clear enough to allow an unambiguous 
interpretation of the required behavior in this area. If you think there 
are areas where the Solaris 

Re: [PATCH 3/8] net: consolidate memcg socket buffer tracking and accounting

2015-10-22 Thread Vladimir Davydov
On Thu, Oct 22, 2015 at 12:21:31AM -0400, Johannes Weiner wrote:
> The tcp memory controller has extensive provisions for future memory
> accounting interfaces that won't materialize after all. Cut the code
> base down to what's actually used, now and in the likely future.
> 
> - There won't be any different protocol counters in the future, so a
>   direct sock->sk_memcg linkage is enough. This eliminates a lot of
>   callback maze and boilerplate code, and restores most of the socket
>   allocation code to pre-tcp_memcontrol state.
> 
> - There won't be a tcp control soft limit, so integrating the memcg

In fact, the code is ready for the "soft" limit (I mean min, pressure,
max tuple), it just lacks a knob.

>   code into the global skmem limiting scheme complicates things
>   unnecessarily. Replace all that with simple and clear charge and
>   uncharge calls--hidden behind a jump label--to account skb memory.
> 
> - The previous jump label code was an elaborate state machine that
>   tracked the number of cgroups with an active socket limit in order
>   to enable the skmem tracking and accounting code only when actively
>   necessary. But this is overengineered: it was meant to protect the
>   people who never use this feature in the first place. Simply enable
>   the branches once when the first limit is set until the next reboot.
> 
...
> @@ -1136,9 +1090,6 @@ static inline bool sk_under_memory_pressure(const 
> struct sock *sk)
>   if (!sk->sk_prot->memory_pressure)
>   return false;
>  
> - if (mem_cgroup_sockets_enabled && sk->sk_cgrp)
> - return !!sk->sk_cgrp->memory_pressure;
> -

AFAIU, now we won't shrink the window on hitting the limit, i.e. this
patch subtly changes the behavior of the existing knobs, potentially
breaking them.

Thanks,
Vladimir
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next] net: encx24j600: Fix mask to update LED configuration

2015-10-22 Thread jon
From: Jon Ringle 

This fixes the mask used to update the LED configuration so that it clears
the necessary bits as well as setting the bits according to the mask.
Also reverse the LED configuration to show the Link state + collisions in
LEDA and the Link state + TX/RX events in LEDB.

Signed-off-by: Jon Ringle 
---
 drivers/net/ethernet/microchip/encx24j600.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/microchip/encx24j600.c 
b/drivers/net/ethernet/microchip/encx24j600.c
index e1329d9..bf08ce2 100644
--- a/drivers/net/ethernet/microchip/encx24j600.c
+++ b/drivers/net/ethernet/microchip/encx24j600.c
@@ -617,10 +617,10 @@ static int encx24j600_hw_init(struct encx24j600_priv 
*priv)
   (eidled & REVID_MASK) >> REVID_SHIFT);
 
/* PHY Leds: link status,
-* LEDA: Link + transmit/receive events
-* LEDB: Link State + colision events
+* LEDA: Link State + collision events
+* LEDB: Link State + transmit/receive events
 */
-   encx24j600_update_reg(priv, EIDLED, 0xbc00, 0xbc00);
+   encx24j600_update_reg(priv, EIDLED, 0xff00, 0xcb00);
 
/* Loopback disabled */
encx24j600_write_reg(priv, MACON1, 0x9);
-- 
2.4.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 1/2] geneve: implement support for IPv6-based tunnels

2015-10-22 Thread John W. Linville
NOTE: Link-local IPv6 addresses for remote endpoints are not supported,
since the driver currently has no capacity for binding a geneve
interface to a specific link.

Signed-off-by: John W. Linville 
---
v5:
- wrap declaration of sock6 in geneve_dev with IS_ENABLED(CONFIG_IPV6)
- remove superfluous '!!' when assigning geneve->collect_md to bool
- use skb_scrub_packet in IPv4 tx path as well 
- check for NULL ip_tunnel_info pointer in geneve[6]_xmit_skb
- use ipv6_addr_equal for comparing IPv6 addresses
- more use of IS_ENABLED(CONFIG_IPV6) for preserving build integrity
- reject link-local ipv6 address for remote tunnel endpoint

v4:
- treat mode field of ip_tunnel_info as flags
- add a missing IS_ENABLED(CONFIG_IPV6) to geneve_rx
- remove unneeded flags field in geneve_dev
- NULL-check parameter for __geneve_sock_release
- check remote socket family for AF_UNSPEC in geneve_configure
- rename geneve_get_{rt,dst} as geneve_get_{v4_rt,v6_dst}
- refactor some error handling in the xmit paths

v3:
- declare geneve_remote_unspec as static

v2:
- do not require remote address for tx on metadata tunnels
- pass correct sockaddr family to udp_tun_rx_dst in geneve_rx
- accommodate both ipv4 and ipv6 sockets open on same tunnel
- move declaration of geneve_get_dst for aesthetic purposes

 drivers/net/geneve.c | 482 +++
 include/uapi/linux/if_link.h |   1 +
 2 files changed, 395 insertions(+), 88 deletions(-)

diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c
index cde29f8a37bf..47f7512f02cf 100644
--- a/drivers/net/geneve.c
+++ b/drivers/net/geneve.c
@@ -46,16 +46,27 @@ struct geneve_net {
 
 static int geneve_net_id;
 
+union geneve_addr {
+   struct sockaddr_in sin;
+   struct sockaddr_in6 sin6;
+   struct sockaddr sa;
+};
+
+static union geneve_addr geneve_remote_unspec = { .sa.sa_family = AF_UNSPEC, };
+
 /* Pseudo network device */
 struct geneve_dev {
struct hlist_node  hlist;   /* vni hash table */
struct net *net;/* netns for packet i/o */
struct net_device  *dev;/* netdev for geneve tunnel */
-   struct geneve_sock *sock;   /* socket used for geneve tunnel */
+   struct geneve_sock *sock4;  /* IPv4 socket used for geneve tunnel */
+#if IS_ENABLED(CONFIG_IPV6)
+   struct geneve_sock *sock6;  /* IPv6 socket used for geneve tunnel */
+#endif
u8 vni[3];  /* virtual network ID for tunnel */
u8 ttl; /* TTL override */
u8 tos; /* TOS override */
-   struct sockaddr_in remote;  /* IPv4 address for link partner */
+   union geneve_addr  remote;  /* IP address for link partner */
struct list_head   next;/* geneve's per namespace list */
__be16 dst_port;
bool   collect_md;
@@ -103,12 +114,32 @@ static struct geneve_dev *geneve_lookup(struct 
geneve_sock *gs,
vni_list_head = >vni_list[hash];
hlist_for_each_entry_rcu(geneve, vni_list_head, hlist) {
if (!memcmp(vni, geneve->vni, sizeof(geneve->vni)) &&
-   addr == geneve->remote.sin_addr.s_addr)
+   addr == geneve->remote.sin.sin_addr.s_addr)
return geneve;
}
return NULL;
 }
 
+#if IS_ENABLED(CONFIG_IPV6)
+static struct geneve_dev *geneve6_lookup(struct geneve_sock *gs,
+struct in6_addr addr6, u8 vni[])
+{
+   struct hlist_head *vni_list_head;
+   struct geneve_dev *geneve;
+   __u32 hash;
+
+   /* Find the device for this VNI */
+   hash = geneve_net_vni_hash(vni);
+   vni_list_head = >vni_list[hash];
+   hlist_for_each_entry_rcu(geneve, vni_list_head, hlist) {
+   if (!memcmp(vni, geneve->vni, sizeof(geneve->vni)) &&
+   ipv6_addr_equal(, >remote.sin6.sin6_addr))
+   return geneve;
+   }
+   return NULL;
+}
+#endif
+
 static inline struct genevehdr *geneve_hdr(const struct sk_buff *skb)
 {
return (struct genevehdr *)(udp_hdr(skb) + 1);
@@ -121,24 +152,49 @@ static void geneve_rx(struct geneve_sock *gs, struct 
sk_buff *skb)
struct metadata_dst *tun_dst = NULL;
struct geneve_dev *geneve = NULL;
struct pcpu_sw_netstats *stats;
-   struct iphdr *iph;
-   u8 *vni;
+   struct iphdr *iph = NULL;
__be32 addr;
-   int err;
+   static u8 zero_vni[3];
+   u8 *vni;
+   int err = 0;
+   sa_family_t sa_family;
+#if IS_ENABLED(CONFIG_IPV6)
+   struct ipv6hdr *ip6h = NULL;
+   struct in6_addr addr6;
+   static struct in6_addr zero_addr6;
+#endif
+
+   sa_family = gs->sock->sk->sk_family;
 
-   iph = ip_hdr(skb); /* outer IP header... */
+   if (sa_family == AF_INET) {
+   iph = ip_hdr(skb); /* outer IP header... */
 
-   if 

[PATCH v5 2/2] geneve: handle ipv6 priority like ipv4 tos

2015-10-22 Thread John W. Linville
Signed-off-by: John W. Linville 
Reported-by: Jesse Gross 
Reviewed-by: Jesse Gross 
---
v5 -- same as previous revision

 drivers/net/geneve.c | 19 +--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c
index 47f7512f02cf..4d4d8ca9eb7a 100644
--- a/drivers/net/geneve.c
+++ b/drivers/net/geneve.c
@@ -755,6 +755,7 @@ static struct dst_entry *geneve_get_v6_dst(struct sk_buff 
*skb,
struct geneve_dev *geneve = netdev_priv(dev);
struct geneve_sock *gs6 = geneve->sock6;
struct dst_entry *dst = NULL;
+   __u8 prio;
 
memset(fl6, 0, sizeof(*fl6));
fl6->flowi6_mark = skb->mark;
@@ -763,7 +764,16 @@ static struct dst_entry *geneve_get_v6_dst(struct sk_buff 
*skb,
if (info) {
fl6->daddr = info->key.u.ipv6.dst;
fl6->saddr = info->key.u.ipv6.src;
+   fl6->flowi6_tos = RT_TOS(info->key.tos);
} else {
+   prio = geneve->tos;
+   if (prio == 1) {
+   const struct iphdr *iip = ip_hdr(skb);
+
+   prio = ip_tunnel_get_dsfield(iip, skb);
+   }
+
+   fl6->flowi6_tos = RT_TOS(prio);
fl6->daddr = geneve->remote.sin6.sin6_addr;
}
 
@@ -884,8 +894,9 @@ static netdev_tx_t geneve6_xmit_skb(struct sk_buff *skb, 
struct net_device *dev,
struct geneve_dev *geneve = netdev_priv(dev);
struct geneve_sock *gs6 = geneve->sock6;
struct dst_entry *dst = NULL;
+   const struct iphdr *iip; /* interior IP header */
struct flowi6 fl6;
-   __u8 ttl;
+   __u8 prio, ttl;
__be16 sport;
bool udp_csum;
int err;
@@ -914,6 +925,8 @@ static netdev_tx_t geneve6_xmit_skb(struct sk_buff *skb, 
struct net_device *dev,
sport = udp_flow_src_port(geneve->net, skb, 1, USHRT_MAX, true);
skb_reset_mac_header(skb);
 
+   iip = ip_hdr(skb);
+
if (info) {
const struct ip_tunnel_key *key = >key;
u8 *opts = NULL;
@@ -930,6 +943,7 @@ static netdev_tx_t geneve6_xmit_skb(struct sk_buff *skb, 
struct net_device *dev,
if (unlikely(err))
goto err;
 
+   prio = ip_tunnel_ecn_encap(key->tos, iip, skb);
ttl = key->ttl;
} else {
udp_csum = false;
@@ -938,13 +952,14 @@ static netdev_tx_t geneve6_xmit_skb(struct sk_buff *skb, 
struct net_device *dev,
if (unlikely(err))
goto err;
 
+   prio = ip_tunnel_ecn_encap(fl6.flowi6_tos, iip, skb);
ttl = geneve->ttl;
if (!ttl && ipv6_addr_is_multicast())
ttl = 1;
ttl = ttl ? : ip6_dst_hoplimit(dst);
}
err = udp_tunnel6_xmit_skb(dst, gs6->sock->sk, skb, dev,
-  , , 0, ttl,
+  , , prio, ttl,
   sport, geneve->dst_port, !udp_csum);
 
iptunnel_xmit_stats(err, >stats, dev->tstats);
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bug 106241] New: shutdown(3)/close(3) behaviour is incorrect for sockets in accept(3)

2015-10-22 Thread Casper . Dik

>On Thu, Oct 22, 2015 at 08:24:51PM +0200, casper@oracle.com wrote:
>
>> The external behaviour atomic; you cannot distinguish the order
>> between the closing of the original file (and waking up other threads
>> waiting for a record lock) or changing the file referenced by that newfd.
>> 
>> But this not include a global or process specific lock.
>
>Interesting...  Do you mean that decriptor-to-file lookup blocks until that
>rundown finishes?

For that particular file descriptor, yes.  (I'm assuming you mean the 
Solaris kernel running down all lwps who have a system in progress on that 
particular file descriptor).  All other fd to file lookups are not blocked 
at all by this locking.

It should be clear that any such occurrences are application errors and 
should be hardly ever seen in practice.  It is also known when this is 
needed so it is hardly even attempted.

Casper

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 01/15] net: wireless: ath: use | instead of + for summing bitmasks

2015-10-22 Thread punit vara
On Thu, Oct 22, 2015 at 3:13 AM, Sergei Shtylyov
 wrote:
> Hello.
>
> On 10/21/2015 05:55 PM, Punit Vara wrote:
>
>> This patch is to the ath10k/pci.h file that fixes following warning
>
>
>pci.c, you mean?
>
>
>>   reported by coccicheck:
>>
>> WARNING: sum of probable bitmasks, consider |
>>
>> I have replaced + with OR operator | for summing bitmasks
>>
>> Signed-off-by: Punit Vara 
>> ---
>>   drivers/net/wireless/ath/ath10k/pci.c | 10 +-
>>   1 file changed, 5 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/net/wireless/ath/ath10k/pci.c
>> b/drivers/net/wireless/ath/ath10k/pci.c
>> index 1046ab6..165a318 100644
>> --- a/drivers/net/wireless/ath/ath10k/pci.c
>> +++ b/drivers/net/wireless/ath/ath10k/pci.c
>> @@ -775,7 +775,7 @@ static u32 ath10k_pci_targ_cpu_to_ce_addr(struct
>> ath10k *ar, u32 addr)
>> switch (ar->hw_rev) {
>> case ATH10K_HW_QCA988X:
>> case ATH10K_HW_QCA6174:
>> -   val = (ath10k_pci_read32(ar, SOC_CORE_BASE_ADDRESS +
>> +   val = (ath10k_pci_read32(ar, SOC_CORE_BASE_ADDRESS |
>>   CORE_CTRL_ADDRESS) &
>>0x7ff) << 21;
>> break;
>> @@ -1443,10 +1443,10 @@ static void ath10k_pci_irq_msi_fw_mask(struct
>> ath10k *ar)
>> switch (ar->hw_rev) {
>> case ATH10K_HW_QCA988X:
>> case ATH10K_HW_QCA6174:
>> -   val = ath10k_pci_read32(ar, SOC_CORE_BASE_ADDRESS +
>> +   val = ath10k_pci_read32(ar, SOC_CORE_BASE_ADDRESS |
>> CORE_CTRL_ADDRESS);
>> val &= ~CORE_CTRL_PCIE_REG_31_MASK;
>> -   ath10k_pci_write32(ar, SOC_CORE_BASE_ADDRESS +
>> +   ath10k_pci_write32(ar, SOC_CORE_BASE_ADDRESS |
>
>
>Don't think these 2 are justified.
>
>
>> @@ -1464,10 +1464,10 @@ static void ath10k_pci_irq_msi_fw_unmask(struct
>> ath10k *ar)
>> switch (ar->hw_rev) {
>> case ATH10K_HW_QCA988X:
>> case ATH10K_HW_QCA6174:
>> -   val = ath10k_pci_read32(ar, SOC_CORE_BASE_ADDRESS +
>> +   val = ath10k_pci_read32(ar, SOC_CORE_BASE_ADDRESS |
>> CORE_CTRL_ADDRESS);
>> val |= CORE_CTRL_PCIE_REG_31_MASK;
>> -   ath10k_pci_write32(ar, SOC_CORE_BASE_ADDRESS +
>> +   ath10k_pci_write32(ar, SOC_CORE_BASE_ADDRESS |
>>CORE_CTRL_ADDRESS, val);
>
>
>And these too.
>
> [...]
>
> MBR, Sergei
>

 CORE_CTRL_ADDRESS is 0x so it will not mask .. these patch should
be rejected ...I have modified by looking at coccicheck .Actually
First time I have used that tool  I do know it can also generate false
warning sometime .I have experience about checkpatch.pl ..Sorry for
this patch rest I have resend
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 15/15] net: wireless: ath: Remove unneeded variable ret returning 0

2015-10-22 Thread punit vara
On Thu, Oct 22, 2015 at 3:16 AM, Sergei Shtylyov
 wrote:
> On 10/21/2015 05:55 PM, Punit Vara wrote:
>
>> This patch is to the ath5k/eeprom.c that fixes up warning caught by
>> coccicheck:
>>
>> Unneeded variable: "ret". Return "0" on line 980
>>
>> Remove unneeded variable ret created to return zero.
>>
>> Signed-off-by: Punit Vara 
>> ---
>>   drivers/net/wireless/ath/wcn36xx/main.c | 3 +--
>>   1 file changed, 1 insertion(+), 2 deletions(-)
>>
>> diff --git a/drivers/net/wireless/ath/wcn36xx/main.c
>> b/drivers/net/wireless/ath/wcn36xx/main.c
>> index 7c169ab..82982d5 100644
>> --- a/drivers/net/wireless/ath/wcn36xx/main.c
>> +++ b/drivers/net/wireless/ath/wcn36xx/main.c
>> @@ -935,7 +935,6 @@ static const struct ieee80211_ops wcn36xx_ops = {
>>
>>   static int wcn36xx_init_ieee80211(struct wcn36xx *wcn)
>>   {
>> -   int ret = 0;
>>
>
>   This empty line should be removed too.
>
>> static const u32 cipher_suites[] = {
>> WLAN_CIPHER_SUITE_WEP40,
>
> [...]
>
> MBR, Sergei
>
I have resend this patch as you said Sergei . Thanks for feedback ..
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 14/15] net: wireless: ath: Remove unneeded variable ret returning 0

2015-10-22 Thread Punit Vara
This patch is to the ath5k/eeprom.c that fixes up warning caught by
coccicheck:

-Unneeded variable: "ret". Return "0" on line 1733

Remove unneccesary variable ret created to return zero.

Also removed empty line suggested by Sergei
 
Signed-off-by: Punit Vara 
---
 drivers/net/wireless/ath/ath5k/eeprom.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/net/wireless/ath/ath5k/eeprom.c 
b/drivers/net/wireless/ath/ath5k/eeprom.c
index 94d34ee..0c82ea5 100644
--- a/drivers/net/wireless/ath/ath5k/eeprom.c
+++ b/drivers/net/wireless/ath/ath5k/eeprom.c
@@ -1707,8 +1707,7 @@ ath5k_eeprom_read_spur_chans(struct ath5k_hw *ah)
struct ath5k_eeprom_info *ee = >ah_capabilities.cap_eeprom;
u32 offset;
u16 val;
-   int ret = 0, i;
-
+   int i;
offset = AR5K_EEPROM_CTL(ee->ee_version) +
AR5K_EEPROM_N_CTLS(ee->ee_version);
 
@@ -1730,7 +1729,7 @@ ath5k_eeprom_read_spur_chans(struct ath5k_hw *ah)
}
}
 
-   return ret;
+   return 0;
 }
 
 
-- 
2.5.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/8] mm: memcontrol: account socket memory in unified hierarchy

2015-10-22 Thread Vladimir Davydov
Hi Johannes,

On Thu, Oct 22, 2015 at 12:21:28AM -0400, Johannes Weiner wrote:
...
> Patch #5 adds accounting and tracking of socket memory to the unified
> hierarchy memory controller, as described above. It uses the existing
> per-cpu charge caches and triggers high limit reclaim asynchroneously.
> 
> Patch #8 uses the vmpressure extension to equalize pressure between
> the pages tracked natively by the VM and socket buffer pages. As the
> pool is shared, it makes sense that while natively tracked pages are
> under duress the network transmit windows are also not increased.

First of all, I've no experience in networking, so I'm likely to be
mistaken. Nevertheless I beg to disagree that this patch set is a step
in the right direction. Here goes why.

I admit that your idea to get rid of explicit tcp window control knobs
and size it dynamically basing on memory pressure instead does sound
tempting, but I don't think it'd always work. The problem is that in
contrast to, say, dcache, we can't shrink tcp buffers AFAIU, we can only
stop growing them. Now suppose a system hasn't experienced memory
pressure for a while. If we don't have explicit tcp window limit, tcp
buffers on such a system might have eaten almost all available memory
(because of network load/problems). If a user workload that needs a
significant amount of memory is started suddenly then, the network code
will receive a notification and surely stop growing buffers, but all
those buffers accumulated won't disappear instantly. As a result, the
workload might be unable to find enough free memory and have no choice
but invoke OOM killer. This looks unexpected from the user POV.

That said, I think we do need per memcg tcp window control similar to
what we have system-wide. In other words, Glauber's work makes sense to
me. You might want to point me at my RFC patch where I proposed to
revert it (https://lkml.org/lkml/2014/9/12/401). Well, I've changed my
mind since then. Now I think I was mistaken, luckily I was stopped.
However, I may be mistaken again :-)

Thanks,
Vladimir
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net] net: try harder to not reuse ifindex when moving interfaces

2015-10-22 Thread Thomas Graf
On 10/22/15 at 07:21pm, Hannes Frederic Sowa wrote:
> Hi Thomas,
> 
> On Thu, Oct 22, 2015, at 18:45, Thomas Graf wrote:
> > I understand the race but when does it occur? Whoever creates
> > the original interface owns it and is responsible for its
> > lifecycle. *Iff* for some reason multiple entities manipulate
> > the interface, then it's probably a lot safer to just use flock
> > or something similar to serialize access entirely in user space.
> 
> This only works if all networking configuration programs would
> standardize on the same flock. Also, under memory pressure we lose
> netlink monitor messages, so we need to deal with timeouts and retries
> and manual sync up on the networking configuration, which makes this
> scheme a lot harder. For normal socket io, where we specify e.g. ifindex
> in sin6_addr, this is not really usable at all.

Again, what is the scenario where this happens? Is this being
hit or are we talking theoretical races? I'd like to understand
the background of this.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bug 106241] New: shutdown(3)/close(3) behaviour is incorrect for sockets in accept(3)

2015-10-22 Thread Al Viro
On Thu, Oct 22, 2015 at 06:39:34PM +0100, Alan Burlison wrote:
> On 22/10/2015 18:05, Al Viro wrote:
> 
> >Oh, for...  Right in this thread an example of complete BS has been quoted
> >from POSIX close(2).  The part about closing a file when the last descriptor
> >gets closed.  _Nothing_ is POSIX-compliant in that respect (nor should
> >it be).
> 
> That's not exactly what it says, we've already discussed, for
> example in the case of pending async IO on a filehandle.

Sigh...  It completely fails to mention descriptor-passing.  Which
a) is relevant to what "last close" means and
b) had been there for nearly the third of a century.

> I agree that part could do with some polishing.

google("wire brush of enlightenment") is what comes to mind...

> >and (b) says fsck-all about the effects of closing descriptor.  The latter
> >is a problem, since nothing in close(2) bothers making a distinction between
> >the effects specific to particular syscall and those common to all ways of
> >closing a descriptor.  And no, it's not a nitpicking - consider e.g. the
> >parts concerning the order of events triggered by close(2) (such and such
> >should be completed before close(2) returns); should it be taken as "same
> >events should be completed before newfd is associated with the file 
> >description
> >refered to by oldfd"?  It _is_ user-visible, since close(2) removes fcntl
> >locks.  Sure, there is (otherwise unexplained)
> > The dup2() function is not intended for use in critical regions
> > as a synchronization mechanism.
> >down in informative sections, so one can infer that event order here isn't
> >to be relied upon.  With no way to guess whether the event order concerning
> >e.g. effect on ongoing accept(newfd) is any different in that respect.
> 
> I think "it shall be closed first" makes it pretty clear that what
> is expected is the same behaviour as any direct invocation of close,
> and that has to happen before the reassignment. What makes you
> believe that's isn't the case?

So unless I'm misparsing something, you want
thread A: accept(newfd)
thread B: dup2(oldfd, newfd)
have accept() bugger off before the switchover happens?

What should happen if thread C does accept(newfd) right as B has decided that
there's nothing more to wait?  For close(newfd) it would be simple - we are
going to have lookup by descriptor fail with EBADF anyway, so making it do
so as soon as we go hunting for those who are currently in accept(newfd)
would do the trick - no new threads like that shall appear and as long as
the descriptor is not declared free for taking by descriptor allocation nobody
is going to be screwed by open() picking that slot of descriptor table too
early.  Trying to do that for dup2() would lose atomicity.  I honestly don't
know how Solaris behaves in that case, BTW - the race (if any) would probably
be hard to hit, so in case of Linux I would have to go and RTFS before saying
that there isn't one.  I can't do that in with Solaris; all I can do here
is ask you guys...

Moreover, see above for record locks removal.  Should that happen prior to
switchover?  If you have

dup(fd, fd2);
set a record lock on fd2
spawn a thread
in child, try to grab the same lock on fd2
in parent, do some work and close(fd)

you are guaranteed that child won't see fd refering to the same file after it
acquires the lock.
Replace close(fd) with dup(fd3, fd); should the same hold true in that case?

FWIW, Linux behaviour in that area is to have record locks removal done
between the switchover and return to userland in case of dup2() and between
the removal from descriptor table and return to userland in case of close().

> Personally I believe the spec is clear enough to allow an
> unambiguous interpretation of the required behavior in this area. If
> you think there are areas where the Solaris behaviour is in
> disagreement with the spec then I'd be interested to hear them.

The spec is so vague that I strongly suspect that *both* Solaris and Linux
behaviours are not in disagreement with it (modulo shutdown(2) extension
Linux-side and we are really stuck with that one).
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bug 106241] New: shutdown(3)/close(3) behaviour is incorrect for sockets in accept(3)

2015-10-22 Thread Al Viro
On Thu, Oct 22, 2015 at 08:24:51PM +0200, casper@oracle.com wrote:

> The external behaviour atomic; you cannot distinguish the order
> between the closing of the original file (and waking up other threads
> waiting for a record lock) or changing the file referenced by that newfd.
> 
> But this not include a global or process specific lock.

Interesting...  Do you mean that decriptor-to-file lookup blocks until that
rundown finishes?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


CONFIG_XPS depends on L1_CACHE_BYTES being greater than sizeof(struct xps_map)

2015-10-22 Thread Helge Deller
Hi Tom & David,

I've queued-up a patch for the parisc architecture which reduces L1_CACHE_BYTES 
from 32 to 16:
  https://patchwork.kernel.org/patch/7399291/

But this change will break the kernel build like this:

In file included from net/core/dev.c:92:0:
net/core/dev.c: In function ‘expand_xps_map’:
include/linux/netdevice.h:721:27: warning: overflow in implicit constant 
conversion [-Woverflow]
   #define XPS_MIN_MAP_ALLOC ((L1_CACHE_BYTES - sizeof(struct xps_map)) \
net/core/dev.c:1972:18: note: in expansion of macro ‘XPS_MIN_MAP_ALLOC’
   int alloc_len = XPS_MIN_MAP_ALLOC;

Do you see an easy way to fix this ?

Thanks,
Helge 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 14/15] net: wireless: ath: Remove unneeded variable ret returning 0

2015-10-22 Thread punit vara
On Fri, Oct 23, 2015 at 12:05 AM, Sergei Shtylyov
 wrote:
> Hello.
>
> On 10/22/2015 09:26 PM, Punit Vara wrote:
>
>> This patch is to the ath5k/eeprom.c that fixes up warning caught by
>> coccicheck:
>>
>> -Unneeded variable: "ret". Return "0" on line 1733
>>
>> Remove unneccesary variable ret created to return zero.
>>
>> Also removed empty line suggested by Sergei
>>
>> Signed-off-by: Punit Vara 
>> ---
>>   drivers/net/wireless/ath/ath5k/eeprom.c | 5 ++---
>>   1 file changed, 2 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/net/wireless/ath/ath5k/eeprom.c
>> b/drivers/net/wireless/ath/ath5k/eeprom.c
>> index 94d34ee..0c82ea5 100644
>> --- a/drivers/net/wireless/ath/ath5k/eeprom.c
>> +++ b/drivers/net/wireless/ath/ath5k/eeprom.c
>> @@ -1707,8 +1707,7 @@ ath5k_eeprom_read_spur_chans(struct ath5k_hw *ah)
>> struct ath5k_eeprom_info *ee = >ah_capabilities.cap_eeprom;
>> u32 offset;
>> u16 val;
>> -   int ret = 0, i;
>> -
>
>
>No, this one shouldn't have been removed. There should an empty line
> between the declarations and the statements.
>
>> +   int i;
>> offset = AR5K_EEPROM_CTL(ee->ee_version) +
>> AR5K_EEPROM_N_CTLS(ee->ee_version);
>>
>
> MBR, Sergei
>

I have resent patch with update but I cant see it in my mailbox I dont
know Why.. Sorry if You are getting mail more time. I am sending final
version v5 as last.Please sorry for inconvenience
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 14/15] net: wireless: ath: Remove unneeded variable ret returning 0

2015-10-22 Thread Punit Vara
Remove empty line suggested by Sergei

This patch is to the ath5k/eeprom.c that fixes up warning caught by
coccicheck:

Unneeded variable: "ret". Return "0" on line 980

Remove unneeded variable ret created to return zero.

Signed-off-by: Punit Vara 
---
 drivers/net/wireless/ath/wcn36xx/main.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/net/wireless/ath/wcn36xx/main.c 
b/drivers/net/wireless/ath/wcn36xx/main.c
index 900e72a..94bcc08 100644
--- a/drivers/net/wireless/ath/wcn36xx/main.c
+++ b/drivers/net/wireless/ath/wcn36xx/main.c
@@ -935,8 +935,6 @@ static const struct ieee80211_ops wcn36xx_ops = {
 
 static int wcn36xx_init_ieee80211(struct wcn36xx *wcn)
 {
-   int ret = 0;
-
static const u32 cipher_suites[] = {
WLAN_CIPHER_SUITE_WEP40,
WLAN_CIPHER_SUITE_WEP104,
@@ -977,7 +975,7 @@ static int wcn36xx_init_ieee80211(struct wcn36xx *wcn)
wcn->hw->sta_data_size = sizeof(struct wcn36xx_sta);
wcn->hw->vif_data_size = sizeof(struct wcn36xx_vif);
 
-   return ret;
+   return 0;
 }
 
 static int wcn36xx_platform_get_resources(struct wcn36xx *wcn,
-- 
2.5.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 8/8] mm: memcontrol: hook up vmpressure to socket pressure

2015-10-22 Thread Vladimir Davydov
On Thu, Oct 22, 2015 at 12:21:36AM -0400, Johannes Weiner wrote:
...
> @@ -185,8 +183,29 @@ static void vmpressure_work_fn(struct work_struct *work)
>   vmpr->reclaimed = 0;
>   spin_unlock(>sr_lock);
>  
> + level = vmpressure_calc_level(scanned, reclaimed);
> +
> + if (level > VMPRESSURE_LOW) {

So we start socket_pressure at MEDIUM. Why not at LOW or CRITICAL?

> + struct mem_cgroup *memcg;
> + /*
> +  * Let the socket buffer allocator know that we are
> +  * having trouble reclaiming LRU pages.
> +  *
> +  * For hysteresis, keep the pressure state asserted
> +  * for a second in which subsequent pressure events
> +  * can occur.
> +  *
> +  * XXX: is vmpressure a global feature or part of
> +  * memcg? There shouldn't be anything memcg-specific
> +  * about exporting reclaim success ratios from the VM.
> +  */
> + memcg = container_of(vmpr, struct mem_cgroup, vmpressure);
> + if (memcg != root_mem_cgroup)
> + memcg->socket_pressure = jiffies + HZ;

Why 1 second?

Thanks,
Vladimir

> + }
> +
>   do {
> - if (vmpressure_event(vmpr, scanned, reclaimed))
> + if (vmpressure_event(vmpr, level))
>   break;
>   /*
>* If not handled, propagate the event upward into the
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bug 106241] New: shutdown(3)/close(3) behaviour is incorrect for sockets in accept(3)

2015-10-22 Thread Al Viro
On Thu, Oct 22, 2015 at 08:34:19AM +0200, casper@oracle.com wrote:
> 
> 
> >And I'm really curious about the things Solaris would do with dup2() there.
> >Does it take into account the possibility of new accept() coming just as
> >dup2() is trying to terminate the ongoing ones?  Is there a window when
> >descriptor-to-file lookups would fail?  Looks like a race/deadlock country...
> 
> Solaris does not "terminate" threads, instead it tells them that the
> file descriptor information used is stale and wkae's up the thread.

Sorry, lousy wording - I meant "terminate syscall in another thread".
Better yet, make that "what happens if new accept(newfd) comes while dup2()
waits for affected syscalls in other threads to finish"?  Assuming it
does wait, that is...

While we are at it, what's the relative order of record locks removal
and switching the meaning of newfd?  In our kernel it happens *after*
the switchover (i.e. if another thread is waiting for a record lock held on
any alias of newfd and we do dup2(oldfd, newfd), the waiter will not see
the state with newfd still refering to what it used to; note that waiter
might be using any descriptor refering to the file newfd used to refer
to, so it won't be affected by the "wake those who had the meaning of
their arguments change" side of things).
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 14/15] net: wireless: ath: Remove unneeded variable ret returning 0

2015-10-22 Thread Sergei Shtylyov

Hello.

On 10/22/2015 09:26 PM, Punit Vara wrote:


This patch is to the ath5k/eeprom.c that fixes up warning caught by
coccicheck:

-Unneeded variable: "ret". Return "0" on line 1733

Remove unneccesary variable ret created to return zero.

Also removed empty line suggested by Sergei

Signed-off-by: Punit Vara 
---
  drivers/net/wireless/ath/ath5k/eeprom.c | 5 ++---
  1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/net/wireless/ath/ath5k/eeprom.c 
b/drivers/net/wireless/ath/ath5k/eeprom.c
index 94d34ee..0c82ea5 100644
--- a/drivers/net/wireless/ath/ath5k/eeprom.c
+++ b/drivers/net/wireless/ath/ath5k/eeprom.c
@@ -1707,8 +1707,7 @@ ath5k_eeprom_read_spur_chans(struct ath5k_hw *ah)
struct ath5k_eeprom_info *ee = >ah_capabilities.cap_eeprom;
u32 offset;
u16 val;
-   int ret = 0, i;
-


   No, this one shouldn't have been removed. There should an empty line 
between the declarations and the statements.



+   int i;
offset = AR5K_EEPROM_CTL(ee->ee_version) +
AR5K_EEPROM_N_CTLS(ee->ee_version);



MBR, Sergei

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 14/15] net: wireless: ath: Remove unneeded variable ret returning 0

2015-10-22 Thread Sergei Shtylyov

On 10/22/2015 09:47 PM, Punit Vara wrote:


Remove black line suggested by Sergei


   Such kind of comments should be under the --- tear line.



This patch is to the ath5k/eeprom.c that fixes up warning caught by
coccicheck:

Unneeded variable: "ret". Return "0" on line 980

Remove unneeded variable ret created to return zero.

Signed-off-by: Punit Vara 

[...]

MBR, Sergei

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/2] devicetree: add binding for Aurora VLSI NB8800 Ethernet controller

2015-10-22 Thread Rob Herring
On Thu, Oct 22, 2015 at 11:28 AM, Mans Rullgard  wrote:
> This adds a binding for the Aurora VLSI NB8800 Ethernet controller
> using the "aurora,nb8800" compatible string.  When used in Sigma
> Designs chips a few additional control registers are available.
> This variant is indicated by the "sigma,smp8640-ethernet" compatible
> string.
>
> Signed-off-by: Mans Rullgard 

Acked-by: Rob Herring 

> ---
> Changes:
> - correct order of compatible strings
> ---
>  .../devicetree/bindings/net/aurora,nb8800.txt  | 26 
> ++
>  1 file changed, 26 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/net/aurora,nb8800.txt
>
> diff --git a/Documentation/devicetree/bindings/net/aurora,nb8800.txt 
> b/Documentation/devicetree/bindings/net/aurora,nb8800.txt
> new file mode 100644
> index 000..cf7108d
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/net/aurora,nb8800.txt
> @@ -0,0 +1,26 @@
> +* Aurora VLSI AU-NB8800 Ethernet controller
> +
> +Required properties:
> +- compatible: Should be "sigma,smp8640-ethernet", "aurora,nb8800"
> +   The former indicates presence of extra features added by Sigma 
> Designs.
> +- reg: Should be MMIO address space of the device
> +- interrupts: Should contain the interrupt specifier for the device
> +- interrupt-parent: Should be a phandle for the interrupt controller
> +- clocks: Should be a phandle for the clock for the device
> +
> +Common properties described in ethernet.txt:
> +- local-mac-address
> +- mac-address
> +- max-speed
> +- phy-mode
> +
> +Example:
> +
> +ethernet@26000 {
> +   compatible = "aurora,nb8800";
> +   reg = <0x1 0x800>;
> +   interrupts = <42>;
> +   clocks = <_clk>;
> +   max-speed = <1000>;
> +   phy-connection-type = "rgmii";
> +};
> --
> 2.5.3
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bug 106241] New: shutdown(3)/close(3) behaviour is incorrect for sockets in accept(3)

2015-10-22 Thread Casper . Dik

From: Al Viro 

>On Thu, Oct 22, 2015 at 06:39:34PM +0100, Alan Burlison wrote:
>> On 22/10/2015 18:05, Al Viro wrote:
>> 
>> >Oh, for...  Right in this thread an example of complete BS has been quoted
>> >from POSIX close(2).  The part about closing a file when the last descriptor
>> >gets closed.  _Nothing_ is POSIX-compliant in that respect (nor should
>> >it be).
>> 
>> That's not exactly what it says, we've already discussed, for
>> example in the case of pending async IO on a filehandle.
>
>Sigh...  It completely fails to mention descriptor-passing.  Which
>   a) is relevant to what "last close" means and
>   b) had been there for nearly the third of a century.

Why is that different?  These clearly count as file descriptors.

>> I agree that part could do with some polishing.
>
>google("wire brush of enlightenment") is what comes to mind...

Standardese is similar to legalese; it not writing that is directly open 
to interpretation to those who are not inducted in writing may have some 
problem interpreting what exactly is meant by wording of the standard.



>> I think "it shall be closed first" makes it pretty clear that what
>> is expected is the same behaviour as any direct invocation of close,
>> and that has to happen before the reassignment. What makes you
>> believe that's isn't the case?
>
>So unless I'm misparsing something, you want
>thread A: accept(newfd)
>thread B: dup2(oldfd, newfd)
>have accept() bugger off before the switchover happens?

Well, certainly *before* we return from dup2().
(and clearly only once we have determined that dup2() will return
successfully)

>What should happen if thread C does accept(newfd) right as B has decided that
>there's nothing more to wait?  For close(newfd) it would be simple - we are
>going to have lookup by descriptor fail with EBADF anyway, so making it do
>so as soon as we go hunting for those who are currently in accept(newfd)
>would do the trick - no new threads like that shall appear and as long as
>the descriptor is not declared free for taking by descriptor allocation nobody
>is going to be screwed by open() picking that slot of descriptor table too
>early.  Trying to do that for dup2() would lose atomicity.  I honestly don't
>know how Solaris behaves in that case, BTW - the race (if any) would probably
>be hard to hit, so in case of Linux I would have to go and RTFS before saying
>that there isn't one.  I can't do that in with Solaris; all I can do here
>is ask you guys...

Solaris dup2() behaves exactly like close().

>Moreover, see above for record locks removal.  Should that happen prior to
>switchover?  If you have
>
>dup(fd, fd2);
>set a record lock on fd2
>spawn a thread
>in child, try to grab the same lock on fd2
>in parent, do some work and close(fd)

>you are guaranteed that child won't see fd refering to the same file after it
>acquires the lock.

Here's you are talking about a lock held by the "parent" and that the
"child" will only get the lock once close(fd) is done?

Yes.  The final "close" is done *after* the pointer has been removed from 
the file descriptor table.

>Replace close(fd) with dup(fd3, fd); should the same hold true in that case?

Yes.

>FWIW, Linux behaviour in that area is to have record locks removal done
>between the switchover and return to userland in case of dup2() and between
>the removal from descriptor table and return to userland in case of close().
>
>> Personally I believe the spec is clear enough to allow an
>> unambiguous interpretation of the required behavior in this area. If
>> you think there are areas where the Solaris behaviour is in
>> disagreement with the spec then I'd be interested to hear them.
>
>The spec is so vague that I strongly suspect that *both* Solaris and Linux
>behaviours are not in disagreement with it (modulo shutdown(2) extension
>Linux-side and we are really stuck with that one).

I'm not sure if the standard allows a handful of threads in accept() for a 
file descriptor which has already been closed *and* can be re-issued for 
other uses.

Casper

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: of_mdiobus_register_phy() and deferred probe

2015-10-22 Thread Sergei Shtylyov

Hello.

On 10/22/2015 04:31 PM, Geert Uytterhoeven wrote:


Due to a probe deferral of an interrupt controller[1], the Micrel
Ethernet PHY on
r8a7791/koelsch started failing to get its IRQ:

 no irq domain found for /interrupt-controller@e61c !

However, of_mdiobus_register_phy() uses irq_of_parse_and_map(), which plainly
ignores EPROBE_DEFER, and it just continues.

Later I get:

 sh-eth ee70.ethernet eth0: attached PHY 1 (IRQ -1) to driver
Micrel KSZ8041RNLI

instead of

 sh-eth ee70.ethernet eth0: attached PHY 1 (IRQ 408) to driver
Micrel KSZ8041RNLI

Ethernet still works, as the interrupt seems to be unneeded(?).


   Yes, the phylib uses PHY polling anyway, IRQ isn't strictly necessary.


Has anyone already looked into fixing of_mdio to handle deferred probing?


   It's the first time I hear about that. Will have to look into this...


Gr{oetje,eeting}s,
 Geert


MBR, Sergei

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] can: xilinx: use readl/writel instead of ioread/iowrite

2015-10-22 Thread Marc Kleine-Budde
On 10/22/2015 10:14 AM, Arnd Bergmann wrote:
> On Thursday 22 October 2015 10:16:02 Kedareswara rao Appana wrote:
>> The driver only supports memory-mapped I/O [by ioremap()],
>> so readl/writel is actually the right thing to do, IMO.
>> During the validation of this driver or IP on ARM 64-bit processor
>> while sending lot of packets observed that the tx packet drop with iowrite
>> Putting the barriers for each tx fifo register write fixes this issue
>> Instead of barriers using writel also fixed this issue.
>>
>> Signed-off-by: Kedareswara rao Appana 
> 
> The two should really do the same thing: iowrite32() is just a static inline
> calling writel() on both ARM32 and ARM64. On which kernel version did you
> observe the difference? It's possible that an older version used
> CONFIG_GENERIC_IOMAP, which made this slightly more expensive.
> 
> If there are barriers that you want to get rid of for performance reasons,
> you should use writel_relaxed(), but be careful to synchronize them correctly
> with regard to DMA. It should be fine in this driver, as it does not
> perform any DMA, but be aware that there is no big-endian version of
> writel_relaxed() at the moment.

We don't have DMA in CAN drivers, but usually a certain write triggers
sending. Do we need a barrier before triggering the sending?

Marc

-- 
Pengutronix e.K.  | Marc Kleine-Budde   |
Industrial Linux Solutions| Phone: +49-231-2826-924 |
Vertretung West/Dortmund  | Fax:   +49-5121-206917- |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |



signature.asc
Description: OpenPGP digital signature


[PATCH 3/3] net: dsa: Make mv88e6060 use nested mdiobus read/write

2015-10-22 Thread Neil Armstrong
Like mv88e6xxx and mdio-mux, to avoid lockdep give false positives
because of nested MDIO busses, switch to previously introduced
nested mdiobus_read/write variants.

Signed-off-by: Neil Armstrong 
---
 drivers/net/dsa/mv88e6060.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/dsa/mv88e6060.c b/drivers/net/dsa/mv88e6060.c
index c29aebe..9093577 100644
--- a/drivers/net/dsa/mv88e6060.c
+++ b/drivers/net/dsa/mv88e6060.c
@@ -26,7 +26,7 @@ static int reg_read(struct dsa_switch *ds, int addr, int reg)
if (bus == NULL)
return -EINVAL;

-   return mdiobus_read(bus, ds->pd->sw_addr + addr, reg);
+   return mdiobus_read_nested(bus, ds->pd->sw_addr + addr, reg);
 }

 #define REG_READ(addr, reg)\
@@ -47,7 +47,7 @@ static int reg_write(struct dsa_switch *ds, int addr, int 
reg, u16 val)
if (bus == NULL)
return -EINVAL;

-   return mdiobus_write(bus, ds->pd->sw_addr + addr, reg, val);
+   return mdiobus_write_nested(bus, ds->pd->sw_addr + addr, reg, val);
 }

 #define REG_WRITE(addr, reg, val)  \
-- 
1.9.1
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/3] net: dsa: Make mv88e6xxx use nested mdiobus read/write

2015-10-22 Thread Neil Armstrong
Make the mv88e6xxx driver use the previously introduced nested
variants of mdiobus_read/write functions.

Signed-off-by: Neil Armstrong 
---
 drivers/net/dsa/mv88e6xxx.c | 46 +
 1 file changed, 9 insertions(+), 37 deletions(-)

diff --git a/drivers/net/dsa/mv88e6xxx.c b/drivers/net/dsa/mv88e6xxx.c
index 4591240..e3cc66e 100644
--- a/drivers/net/dsa/mv88e6xxx.c
+++ b/drivers/net/dsa/mv88e6xxx.c
@@ -26,34 +26,6 @@
 #include 
 #include "mv88e6xxx.h"

-/* MDIO bus access can be nested in the case of PHYs connected to the
- * internal MDIO bus of the switch, which is accessed via MDIO bus of
- * the Ethernet interface. Avoid lockdep false positives by using
- * mutex_lock_nested().
- */
-static int mv88e6xxx_mdiobus_read(struct mii_bus *bus, int addr, u32 regnum)
-{
-   int ret;
-
-   mutex_lock_nested(>mdio_lock, SINGLE_DEPTH_NESTING);
-   ret = bus->read(bus, addr, regnum);
-   mutex_unlock(>mdio_lock);
-
-   return ret;
-}
-
-static int mv88e6xxx_mdiobus_write(struct mii_bus *bus, int addr, u32 regnum,
-  u16 val)
-{
-   int ret;
-
-   mutex_lock_nested(>mdio_lock, SINGLE_DEPTH_NESTING);
-   ret = bus->write(bus, addr, regnum, val);
-   mutex_unlock(>mdio_lock);
-
-   return ret;
-}
-
 /* If the switch's ADDR[4:0] strap pins are strapped to zero, it will
  * use all 32 SMI bus addresses on its SMI bus, and all switch registers
  * will be directly accessible on some {device address,register address}
@@ -68,7 +40,7 @@ static int mv88e6xxx_reg_wait_ready(struct mii_bus *bus, int 
sw_addr)
int i;

for (i = 0; i < 16; i++) {
-   ret = mv88e6xxx_mdiobus_read(bus, sw_addr, SMI_CMD);
+   ret = mdiobus_read_nested(bus, sw_addr, SMI_CMD);
if (ret < 0)
return ret;

@@ -84,7 +56,7 @@ int __mv88e6xxx_reg_read(struct mii_bus *bus, int sw_addr, 
int addr, int reg)
int ret;

if (sw_addr == 0)
-   return mv88e6xxx_mdiobus_read(bus, addr, reg);
+   return mdiobus_read_nested(bus, addr, reg);

/* Wait for the bus to become free. */
ret = mv88e6xxx_reg_wait_ready(bus, sw_addr);
@@ -92,8 +64,8 @@ int __mv88e6xxx_reg_read(struct mii_bus *bus, int sw_addr, 
int addr, int reg)
return ret;

/* Transmit the read command. */
-   ret = mv88e6xxx_mdiobus_write(bus, sw_addr, SMI_CMD,
- SMI_CMD_OP_22_READ | (addr << 5) | reg);
+   ret = mdiobus_write_nested(bus, sw_addr, SMI_CMD,
+  SMI_CMD_OP_22_READ | (addr << 5) | reg);
if (ret < 0)
return ret;

@@ -103,7 +75,7 @@ int __mv88e6xxx_reg_read(struct mii_bus *bus, int sw_addr, 
int addr, int reg)
return ret;

/* Read the data. */
-   ret = mv88e6xxx_mdiobus_read(bus, sw_addr, SMI_DATA);
+   ret = mdiobus_read_nested(bus, sw_addr, SMI_DATA);
if (ret < 0)
return ret;

@@ -147,7 +119,7 @@ int __mv88e6xxx_reg_write(struct mii_bus *bus, int sw_addr, 
int addr,
int ret;

if (sw_addr == 0)
-   return mv88e6xxx_mdiobus_write(bus, addr, reg, val);
+   return mdiobus_write_nested(bus, addr, reg, val);

/* Wait for the bus to become free. */
ret = mv88e6xxx_reg_wait_ready(bus, sw_addr);
@@ -155,13 +127,13 @@ int __mv88e6xxx_reg_write(struct mii_bus *bus, int 
sw_addr, int addr,
return ret;

/* Transmit the data to write. */
-   ret = mv88e6xxx_mdiobus_write(bus, sw_addr, SMI_DATA, val);
+   ret = mdiobus_write_nested(bus, sw_addr, SMI_DATA, val);
if (ret < 0)
return ret;

/* Transmit the write command. */
-   ret = mv88e6xxx_mdiobus_write(bus, sw_addr, SMI_CMD,
- SMI_CMD_OP_22_WRITE | (addr << 5) | reg);
+   ret = mdiobus_write_nested(bus, sw_addr, SMI_CMD,
+  SMI_CMD_OP_22_WRITE | (addr << 5) | reg);
if (ret < 0)
return ret;

-- 
1.9.1
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] net: dsa: mv88e6060: Fix false positive lockdep splat

2015-10-22 Thread Andrew Lunn
> Well, mdio-mux also calls switch_fn inside the mdio_lock, clean refactoring
> would introduce a separate lock and call the nested variants.
> Is that ok ? Can someone test mdio-mux if I make the change ?

Hi Neil

I would not touch mdio-mux. As you said, it does more than lock, read,
unlock. It is not something sufficiently generic to place into shared
code.

Andrew
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 9/9] treewide: Remove newlines inside DEFINE_PER_CPU() macros

2015-10-22 Thread Michal Marek
On 2015-10-22 13:31, Prarit Bhargava wrote:
> 
> 
> On 10/21/2015 03:52 PM, Michal Marek wrote:
>> Dne 21.10.2015 v 21:27 Prarit Bhargava napsal(a):
>>> On 10/15/2015 04:16 PM, Michal Marek wrote:
 Otherwise make tags can't parse them:

 ctags: Warning: arch/ia64/kernel/smp.c:60: null expansion of name pattern 
 "\1"
 ctags: Warning: drivers/xen/events/events_2l.c:41: null expansion of name 
 pattern "\1"
 ctags: Warning: drivers/acpi/processor_idle.c:64: null expansion of name 
 pattern "\1"
 ctags: Warning: kernel/locking/lockdep.c:153: null expansion of name 
 pattern "\1"
 ctags: Warning: kernel/workqueue.c:305: null expansion of name pattern "\1"
 ctags: Warning: kernel/rcu/rcutorture.c:133: null expansion of name 
 pattern "\1"
 ctags: Warning: kernel/rcu/rcutorture.c:135: null expansion of name 
 pattern "\1"
 ctags: Warning: net/rds/page.c:45: null expansion of name pattern "\1"
 ctags: Warning: net/ipv4/syncookies.c:53: null expansion of name pattern 
 "\1"
 ctags: Warning: net/ipv6/syncookies.c:44: null expansion of name pattern 
 "\1"
>>>
>>> I guarantee you're going to end up fixing this issue over and over again as 
>>> more
>>> code is added in.
>>
>> This is certainly going to happen, but it should be quickly spotted by
>> anybody running make tags on linux-next. And 10 instances since the
>> beginning of git is not too many.
> 
> Not everyone uses 'make tags'.  'make cscope' exists and functions correctly 
> ;)

cscope works, but unfortunately it cannot be extended to understand the
preprocessor constructs. But it does not suffer from the problem at
hand, obviously.


>> It's not ctags itself parsing the DEFINE_PER_CPU() macro, but a
>> user-supplied regex specified on commandline. Which can only operate on
>> single lines.
>>
> 
> What's the regex?

See
https://lkml.kernel.org/r/1444940195-28272-9-git-send-email-mma...@suse.com

It used to require a closing parenthesis, so it would not match the
multiline macro invocations at all. Now it matches them, but ctags
correctly warns that the empty string is probably not what we intended
to match.

Michal
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


pull request: bluetooth-next 2015-10-22

2015-10-22 Thread Johan Hedberg
Hi Dave,

Here's probably the last bluetooth-next pull request for 4.4. Among
several other changes it contains the rest of the fixes & cleanups from
the Bluetooth UnplugFest (that didn't need to be hurried to 4.3).

 - Refactoring & cleanups to 6lowpan code
 - New USB ids for two Atheros controllers and BCM43142A0 from Broadcom
 - Fix (quirk) for broken Broadcom BCM2045 controllers
 - Support for latest Apple controllers
 - Improvements to the vendor diagnostic message support

Please let me know if there are any issues pulling. Thanks.

Johan

---
The following changes since commit 26440c835f8b1a491e2704118ac55bf87334366c:

  Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net (2015-10-20 
06:08:27 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next.git 
for-upstream

for you to fetch changes up to 13972adc3240ea8b18b44906b819c622941a64b6:

  Bluetooth: Increase minor version of core module (2015-10-22 13:37:26 +0300)


Alexander Aring (13):
  mac802154: llsec: use kzfree
  bluetooth: 6lowpan: use lowpan dispatch helpers
  6lowpan: introduce LOWPAN_IPHC_MAX_HC_BUF_LEN
  6lowpan: cleanup lowpan_header_compress
  6lowpan: cleanup lowpan_header_decompress
  6lowpan: remove lowpan_fetch_skb_u8
  6lowpan: nhc: move iphc manipulation out of nhc
  6lowpan: move IPHC functionality defines
  6lowpan: remove lowpan_is_addr_broadcast
  6lowpan: iphc: change define values
  6lowpan: rework tc and flow label handling
  6lowpan: put mcast compression in an own function
  ieee802154: 6lowpan: fix memory leak

Arnd Bergmann (1):
  Bluetooth: bpa10x: fix BT_HCIUART dependency

Dan Carpenter (1):
  Bluetooth: hci_bcm: checking for ERR_PTR instead of NULL

David Herrmann (1):
  Bluetooth: hidp: fix device disconnect on idle timeout

Dean Jenkins (3):
  Bluetooth: Unwind l2cap_sock_shutdown()
  Bluetooth: Reorganize mutex lock in l2cap_sock_shutdown()
  Bluetooth: l2cap_disconnection_req priority over shutdown

Dmitry Tunin (2):
  Bluetooth: ath3k: Add new AR3012 0930:021c id
  Bluetooth: ath3k: Add support of AR3012 0cf3:817b device

Johan Hedberg (16):
  Bluetooth: Don't use remote address type to decide IRK persistency
  Bluetooth: Fix removing connection parameters when unpairing
  Bluetooth: Fix missing hdev locking for LE scan cleanup
  Bluetooth: Add le_addr_type() helper function
  Bluetooth: Add hci_conn_hash_lookup_le() helper function
  Bluetooth: Use hci_conn_hash_lookup_le() when possible
  Bluetooth: 6lowpan: Use hci_conn_hash_lookup_le() when possible
  Bluetooth: Remove unnecessary indentation in unpair_device()
  Bluetooth: Add hdev helper variable to hci_le_create_connection_cancel
  Bluetooth: Remove redundant (and possibly wrong) flag clearing
  Bluetooth: Remove unnecessary hci_explicit_connect_lookup function
  Bluetooth: Disable auto-connection parameters when unpairing
  Bluetooth: Fix crash in SMP when unpairing
  Bluetooth: Introduce hci_req helper to abort a connection
  Bluetooth: Take advantage of connection abort helpers
  Bluetooth: Make hci_disconnect() behave correctly for all states

Marcel Holtmann (21):
  Bluetooth: bpa10x: Fix missing BT_HCIUART dependency
  Bluetooth: btusb: Add support for Broadcom LM_DIAG interface
  Bluetooth: btintel: Add support for enabling tracing functionality
  Bluetooth: Remove quirk for HCI_VENDOR_PKT filter handling
  Bluetooth: Restrict valid packet types via HCI_CHANNEL_RAW
  Bluetooth: Queue diagnostic messages together with HCI packets
  Bluetooth: btusb: Print information of Intel SfP lock states
  Bluetooth: Add new quirk for non-persistent diagnostic settings
  Bluetooth: btintel: Set quirk for non-persistent diagnostic settings
  Bluetooth: btintel: Add diagnostic support for older controllers
  Bluetooth: btusb: Mark BCM2045 devices to have broken link key commands
  Bluetooth: btbcm: Fix firmware version number calculation
  Bluetooth: btbcm: Read USB product information for Apple devices
  Bluetooth: Add support setup stage internal notification event
  Bluetooth: btusb: Set early vendor info for Intel and Broadcom
  Bluetooth: btusb: Add support for latest Apple controllers
  Bluetooth: hci_uart: Provide initial manufacturer information
  Bluetooth: Introduce driver specific post init callback
  Bluetooth: btusb: Set manufacturer for Intel bootloader devices
  Bluetooth: btintel: Enable extra Intel vendor events
  Bluetooth: Increase minor version of core module

Santtu Rekilä (1):
  Bluetooth: btusb: Add support for Foxconn/Lenovo BCM43142A0 (105b:e065)

 drivers/bluetooth/Kconfig|   2 +-
 drivers/bluetooth/ath3k.c|   4 +
 drivers/bluetooth/btbcm.c

Re: [PATCH 9/9] treewide: Remove newlines inside DEFINE_PER_CPU() macros

2015-10-22 Thread Prarit Bhargava


On 10/21/2015 03:52 PM, Michal Marek wrote:
> Dne 21.10.2015 v 21:27 Prarit Bhargava napsal(a):
>> On 10/15/2015 04:16 PM, Michal Marek wrote:
>>> Otherwise make tags can't parse them:
>>>
>>> ctags: Warning: arch/ia64/kernel/smp.c:60: null expansion of name pattern 
>>> "\1"
>>> ctags: Warning: drivers/xen/events/events_2l.c:41: null expansion of name 
>>> pattern "\1"
>>> ctags: Warning: drivers/acpi/processor_idle.c:64: null expansion of name 
>>> pattern "\1"
>>> ctags: Warning: kernel/locking/lockdep.c:153: null expansion of name 
>>> pattern "\1"
>>> ctags: Warning: kernel/workqueue.c:305: null expansion of name pattern "\1"
>>> ctags: Warning: kernel/rcu/rcutorture.c:133: null expansion of name pattern 
>>> "\1"
>>> ctags: Warning: kernel/rcu/rcutorture.c:135: null expansion of name pattern 
>>> "\1"
>>> ctags: Warning: net/rds/page.c:45: null expansion of name pattern "\1"
>>> ctags: Warning: net/ipv4/syncookies.c:53: null expansion of name pattern 
>>> "\1"
>>> ctags: Warning: net/ipv6/syncookies.c:44: null expansion of name pattern 
>>> "\1"
>>
>> I guarantee you're going to end up fixing this issue over and over again as 
>> more
>> code is added in.
> 
> This is certainly going to happen, but it should be quickly spotted by
> anybody running make tags on linux-next. And 10 instances since the
> beginning of git is not too many.

Not everyone uses 'make tags'.  'make cscope' exists and functions correctly ;)

> 
> 
>> OOC, why not fix ctags to recognize newlines?
> 
> It's not ctags itself parsing the DEFINE_PER_CPU() macro, but a
> user-supplied regex specified on commandline. Which can only operate on
> single lines.
> 

What's the regex?

P.

> Michal
> 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 9/9] treewide: Remove newlines inside DEFINE_PER_CPU() macros

2015-10-22 Thread Prarit Bhargava


On 10/22/2015 08:06 AM, Michal Marek wrote:
> On 2015-10-22 13:31, Prarit Bhargava wrote:
>>
>>
>> On 10/21/2015 03:52 PM, Michal Marek wrote:
>>> Dne 21.10.2015 v 21:27 Prarit Bhargava napsal(a):
 On 10/15/2015 04:16 PM, Michal Marek wrote:
> Otherwise make tags can't parse them:
>
> ctags: Warning: arch/ia64/kernel/smp.c:60: null expansion of name pattern 
> "\1"
> ctags: Warning: drivers/xen/events/events_2l.c:41: null expansion of name 
> pattern "\1"
> ctags: Warning: drivers/acpi/processor_idle.c:64: null expansion of name 
> pattern "\1"
> ctags: Warning: kernel/locking/lockdep.c:153: null expansion of name 
> pattern "\1"
> ctags: Warning: kernel/workqueue.c:305: null expansion of name pattern 
> "\1"
> ctags: Warning: kernel/rcu/rcutorture.c:133: null expansion of name 
> pattern "\1"
> ctags: Warning: kernel/rcu/rcutorture.c:135: null expansion of name 
> pattern "\1"
> ctags: Warning: net/rds/page.c:45: null expansion of name pattern "\1"
> ctags: Warning: net/ipv4/syncookies.c:53: null expansion of name pattern 
> "\1"
> ctags: Warning: net/ipv6/syncookies.c:44: null expansion of name pattern 
> "\1"

 I guarantee you're going to end up fixing this issue over and over again 
 as more
 code is added in.
>>>
>>> This is certainly going to happen, but it should be quickly spotted by
>>> anybody running make tags on linux-next. And 10 instances since the
>>> beginning of git is not too many.
>>
>> Not everyone uses 'make tags'.  'make cscope' exists and functions correctly 
>> ;)
> 
> cscope works, but unfortunately it cannot be extended to understand the
> preprocessor constructs. But it does not suffer from the problem at
> hand, obviously.
> 
> 
>>> It's not ctags itself parsing the DEFINE_PER_CPU() macro, but a
>>> user-supplied regex specified on commandline. Which can only operate on
>>> single lines.
>>>
>>
>> What's the regex?
> 
> See
> https://lkml.kernel.org/r/1444940195-28272-9-git-send-email-mma...@suse.com
> 
> It used to require a closing parenthesis, so it would not match the
> multiline macro invocations at all. Now it matches them, but ctags
> correctly warns that the empty string is probably not what we intended
> to match.

It seems wrong to change kernel code, not for a bug, but for a userspace search.

P.

> 
> Michal
> 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bug 106241] New: shutdown(3)/close(3) behaviour is incorrect for sockets in accept(3)

2015-10-22 Thread Alan Burlison

On 22/10/2015 07:51, casper@oracle.com wrote:


It would more fruitful to have such a discussion in one of the OpenGroup
mailing lists; people gathered there have a lot of experience and it is
also possible to fix the standard when it turns out that it indeed as
vague as you claim it is (I don't quite agree with that)


+1. If there's interest in doing that I'll ask our POSIX rep the best 
way of initiating such a conversation.


--
Alan Burlison
--
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [RFC v3 2/2] iwlwifi: mvm: send large SKBs to the transport

2015-10-22 Thread Grumbach, Emmanuel
> 
> On Wed, 2015-10-21 at 21:34 +0300, Emmanuel Grumbach wrote:
> > +
> > +   if (skb->protocol == htons(ETH_P_IP)) {
> > +   ip_hdr(tmp)->id = ip_hdr(skb)->id;
> 
> Too late, you already called consume_skb(skb).
> So this is a potential use after free.

Ouch - thanks for catching this!

> 
> > +   be16_add_cpu(_hdr(tmp)->id, i *
> num_subframes);
> > +   }
> > +
> 
> 
> I would use
> base_id = ip_hdr(skb)->id; // before the consume_skb(skb)
> ip_hdr(tmp)->id = htons(base_id + i * num_subframes);
> 
Sure - I will.

I will send a v4 with the fixes.


Re: Fw: [Bug 106241] New: shutdown(3)/close(3) behaviour is incorrect for sockets in accept(3)

2015-10-22 Thread Alan Burlison

On 22/10/2015 05:21, Al Viro wrote:


Most of the work on using a file descriptor is local to the thread.


Using - sure, but what of cacheline dirtied every time you resolve a
descriptor to file reference?


Don't you have to do that anyway, to do anything useful with the file?


How much does it cover and to what
degree is that local to thread?  When it's a refcount inside struct file -
no big deal, we'll be reading the same cacheline anyway and unless several
threads are pounding on the same file with syscalls at the same time,
that won't be a big deal.  But when refcounts are associated with
descriptors...


There is a refcount in the struct held in the per-process list of open 
files and the 'slow path' processing is only taken if there's more than 
one LWP in the process that's accessing the file.



In case of Linux we have two bitmaps and an array of pointers associated
with descriptor table.  They grow on demand (in parallel)
* reserving a descriptor is done under ->file_lock (dropped/regained
around memory allocation if we end up expanding the sucker, actual reassignment
of pointers to array/bitmaps is under that spinlock)
* installing a pointer is lockless (we wait for ongoing resize to
settle, RCU takes care of the rest)
* grabbing a file by index is lockless as well
* removing a pointer is under ->file_lock, so's replacing it by dup2().


Is that table per-process or global?


The point is, dup2() over _unused_ descriptor is inherently racy, but dup2()
over a descriptor we'd opened and kept open should be safe.  As it is,
their implementation boils down to "userland must do process-wide exclusion
between *any* dup2() and all syscalls that might create a new descriptor -
open()/pipe()/socket()/accept()/recvmsg()/dup()/etc".  At the very least,
it's a big QoI problem, especially since such userland exclusion would have
to be taken around the operations that can block for a long time.  Sure,
POSIX wording regarding dup2 is so weak that this behaviour can be argued
to be compliant, but... replacement of the opened file associated with
newfd really ought to be atomic to be of any use for multithreaded processes.


There's existing language in the Issue 7 dup2() description that says 
dup2() has to be atomic:


"the dup2( ) function provides unique services, as no other
interface is able to atomically replace an existing file descriptor."

And there is some new language in Issue 7 Technical Corrigenda 2 that 
reinforces that, when it's talking about reassignment of 
stdin/stdout/stderr:


"Furthermore, a close() followed by a reopen operation (e.g. open(), 
dup() etc) is not atomic; dup2() should be used to change standard file 
descriptors."


I don't think that it's possible to claim that a non-atomic dup2() is 
POSIX-compliant.



IOW, if newfd is an opened descriptor prior to dup2() and no other thread
attempts to close it by any means, there should be no window during which
it would appear to be not opened.  Linux and FreeBSD satisfy that; OpenBSD
seems to do the same, from the quick look.  NetBSD doesn't, no idea about
Solaris.  FWIW, older NetBSD implementation (prior to "File descriptor changes,
discussed on tech-kern:" back in 2008) used to behave like OpenBSD one; it
had fixed a lot of crap, so it's entirely possible that OpenBSD simply has
kept the old implementation, with tons of other races in that area, but this
dup2() race got introduced in that rewrite.


Related to dup2(), there's some rather surprising behaviour on Linux. 
Here's the scenario:


--
ThreadA opens, listens and accepts on socket fd1, waiting for incoming 
connections.


ThreadB waits for a while, then opens normal file fd2 for read/write.
ThreadB uses dup2 to make fd1 a clone of fd2.
ThreadB closes fd2.

ThreadA remains sat in accept on fd1 which is now a plain file, not a 
socket.


ThreadB writes to fd1, the result of which appears in the file, so fd1 
is indeed operating as a plain file.


ThreadB exits. ThreadA is still sat in accept on fd1.

A connection is made to fd1 by another process. The accept call succeeds 
and returns the incoming  connection. fd1 is still operating as a 
socket, even though it's now actually a plain file.

--

I assume this is another consequence of the fact that threads waiting in 
accept don't get a notification if the fd they are using is closed, 
either directly by a call to close or by a syscall such as dup2. Not 
waking up other threads on a fd when it is closed seems like it's 
undesirable behaviour.


I can see the reasoning behind allowing shutdown to be used to do such a 
wakeup even if that's not POSIX-compliant - it may make it slightly 
easier for applications avoid fd recycling races. However the current 
situation is that shutdown is the *only* way to perform such a wakeup - 
simply closing the fd has no effect on any other threads. That seems 
incorrect.


--
Alan Burlison
--
--
To unsubscribe from this list: send the line 

Re: [Bug 106241] New: shutdown(3)/close(3) behaviour is incorrect for sockets in accept(3)

2015-10-22 Thread Alan Burlison

On 22/10/2015 05:44, Al Viro wrote:


It's been said that the current mechanisms in Linux & some BSD
variants can be subject to races


You do realize that it goes for the entire area?  And the races found
in this thread are in the BSD variant that tries to do something similar
to what you guys say Solaris is doing, so I'm not sure which way does
that argument go.  A high-level description with the same level of details
will be race-free in _all_ of them.  The devil is in the details, of course,
and historically they had been very easy to get wrong.  And extra complexity
in that area doesn't make things better.


Yes, I absolutely agree it's difficult to get it right. Modulo 
undetected bugs I believe the Solaris implementation is race free, and 
if it isn't I think it's fair to say we'd consider that to be a bug.



, and the behaviour exhibited
doesn't conform to POSIX, for example requiring the use of
shutdown() on unconnected sockets because close() doesn't kick off
other threads accept()ing on the same fd.


Umm...  The old kernels (and even more - old userland) are not going to
disappear, so we are stuck with such uses of shutdown(2) anyway, POSIX or
no POSIX.


Yes, I understand the problem and in an earlier part of the discussion I 
said that I suspected that all that could really be done was to document 
the behaviour as changing it would break existing code. I still think 
that's probably the only workable option.



I'd be interested to hear
if there's a better and more performant way of handling the
situation that doesn't involve doing the sort of bookkeeping Casper
described,.


So would a lot of other people.


:-)


To quote one of my colleague's favourite sayings: Performance is a
goal, correctness is a constraint.


Except that in this case "correctness" is the matter of rather obscure and
ill-documented areas in POSIX.  Don't get me wrong - this semantics isn't
inherently bad, but it's nowhere near being an absolute requirement.


I don't think it's *that* obscure, when I found the original shutdown() 
problem google showed up another occurrence pretty quickly - 
https://lists.gnu.org/archive/html/libmicrohttpd/2011-09/msg00024.html. 
If a fd is closed then allowing other uses of it to continue in other 
threads seems incorrect to me, as in the dup2() scenario I posted.


--
Alan Burlison
--
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bug 106241] New: shutdown(3)/close(3) behaviour is incorrect for sockets in accept(3)

2015-10-22 Thread Eric Dumazet
On Thu, 2015-10-22 at 08:15 +0200, casper@oracle.com wrote:
> >It's been said that the current mechanisms in Linux & some BSD variants 
> >can be subject to races, and the behaviour exhibited doesn't conform to 
> >POSIX, for example requiring the use of shutdown() on unconnected 
> >sockets because close() doesn't kick off other threads accept()ing on 
> >the same fd. I'd be interested to hear if there's a better and more 
> >performant way of handling the situation that doesn't involve doing the 
> >sort of bookkeeping Casper described,.
> 
> Of course, the implementation is now around 18 years old; clearly a lot of 
> things have changed since then.
> 
> In the particular case of Linux close() on a socket, surely it must be 
> possible to detect at close that it is a listening socket and that you are 
> about to close the last reference; the kernel could then do the shutdown() 
> all by itself.

We absolutely do not _want_ to do this just so that linux becomes slower
to the point Solaris can compete, or you guys can avoid some work.

close(fd) is very far from knowing a file is a 'listener' or even a
'socket' without extra cache line misses.

To force a close of an accept() or whatever blocking socket related
system call a shutdown() makes a lot of sense.

This would have zero additional overhead for the fast path.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bug 106241] New: shutdown(3)/close(3) behaviour is incorrect for sockets in accept(3)

2015-10-22 Thread Alan Burlison

On 22/10/2015 12:30, Eric Dumazet wrote:


We absolutely do not _want_ to do this just so that linux becomes slower
to the point Solaris can compete, or you guys can avoid some work.


Sentiments such as that really have no place in a discussion that's been 
focussed primarily on the behaviour of interfaces, albeit with 
digressions into the potential performance impacts. The discussion has 
been cordial and I for one appreciate Al Viro's posts on the subject, 
from which I've leaned a lot. Can we please keep it that way? Thanks.



close(fd) is very far from knowing a file is a 'listener' or even a
'socket' without extra cache line misses.

To force a close of an accept() or whatever blocking socket related
system call a shutdown() makes a lot of sense.

This would have zero additional overhead for the fast path.


Yes, that would I believe be a significant improvement.

--
Alan Burlison
--
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v3 2/2] iwlwifi: mvm: send large SKBs to the transport

2015-10-22 Thread Eric Dumazet
On Wed, 2015-10-21 at 21:34 +0300, Emmanuel Grumbach wrote:
> +
> + if (skb->protocol == htons(ETH_P_IP)) {
> + ip_hdr(tmp)->id = ip_hdr(skb)->id;

Too late, you already called consume_skb(skb).
So this is a potential use after free.

> + be16_add_cpu(_hdr(tmp)->id, i * num_subframes);
> + }
> +


I would use 

base_id = ip_hdr(skb)->id; // before the consume_skb(skb)

ip_hdr(tmp)->id = htons(base_id + i * num_subframes);


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bug 106241] New: shutdown(3)/close(3) behaviour is incorrect for sockets in accept(3)

2015-10-22 Thread Eric Dumazet
On Thu, 2015-10-22 at 12:58 +0100, Alan Burlison wrote:
> On 22/10/2015 12:30, Eric Dumazet wrote:
> 
> > We absolutely do not _want_ to do this just so that linux becomes slower
> > to the point Solaris can compete, or you guys can avoid some work.
> 
> Sentiments such as that really have no place in a discussion that's been 
> focussed primarily on the behaviour of interfaces, albeit with 
> digressions into the potential performance impacts. The discussion has 
> been cordial and I for one appreciate Al Viro's posts on the subject, 
> from which I've leaned a lot. Can we please keep it that way? Thanks.

Certainly not.

I am a major linux networking developper and wont accept linux is
hijacked by guys who never contributed to it, just so it meets their
unreasonable expectations.

We absolutely care about performance. And I do not care you focus on
POSIX crap.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net,stable] qmi_wwan: add Sierra Wireless MC74xx/EM74xx

2015-10-22 Thread Bjørn Mork
New device IDs shamelessly lifted from the vendor driver.

Signed-off-by: Bjørn Mork 
---
 drivers/net/usb/qmi_wwan.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/net/usb/qmi_wwan.c b/drivers/net/usb/qmi_wwan.c
index 355842b85ee9..2a7c1be23c4f 100644
--- a/drivers/net/usb/qmi_wwan.c
+++ b/drivers/net/usb/qmi_wwan.c
@@ -765,6 +765,10 @@ static const struct usb_device_id products[] = {
{QMI_FIXED_INTF(0x1199, 0x9056, 8)},/* Sierra Wireless Modem */
{QMI_FIXED_INTF(0x1199, 0x9057, 8)},
{QMI_FIXED_INTF(0x1199, 0x9061, 8)},/* Sierra Wireless Modem */
+   {QMI_FIXED_INTF(0x1199, 0x9070, 8)},/* Sierra Wireless 
MC74xx/EM74xx */
+   {QMI_FIXED_INTF(0x1199, 0x9070, 10)},   /* Sierra Wireless 
MC74xx/EM74xx */
+   {QMI_FIXED_INTF(0x1199, 0x9071, 8)},/* Sierra Wireless 
MC74xx/EM74xx */
+   {QMI_FIXED_INTF(0x1199, 0x9071, 10)},   /* Sierra Wireless 
MC74xx/EM74xx */
{QMI_FIXED_INTF(0x1bbb, 0x011e, 4)},/* Telekom Speedstick LTE II 
(Alcatel One Touch L100V LTE) */
{QMI_FIXED_INTF(0x1bbb, 0x0203, 2)},/* Alcatel L800MA */
{QMI_FIXED_INTF(0x2357, 0x0201, 4)},/* TP-LINK HSUPA Modem MA180 */
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: igmp broadcast storm Red Hat EL7 3.1 Kernel

2015-10-22 Thread Jeffrey Merkey
On 8/31/15, Jeffrey Merkey  wrote:
> I was able to reproduce it last night with 2.6.32 on Centos 6.3 so it
> does not seem confined to just the 3.X tree.  I think this is
> something that's been lurking in the kernel for years and just has a
> hard time showing up.   Bugs that go away on their own come back on
> their own.
>
> I got a trace of the traffic but need one of the os.  I'll attempt to
> get a trace of this again since I seem to be able to reproduce it by
> routing NetFlix traffic through the router while I am streaming live
> videos on Linux.
>
> I will setup the system to trap the bug and post the traces.
>
> Jeff
>
>
> On 8/31/15, Cong Wang  wrote:
>> (Cc'ing netdev)
>>
>> On Sat, Aug 29, 2015 at 3:50 PM, Jeffrey Merkey 
>> wrote:
>>> When installing Centos 7 running kernel 3.1 (Centos 7 distro) the
>>> system generates an igmp broadcast storm which locks up and knocks
>>> down a Century Link DSL router.  Powering the router off clears the
>>> broadcast storm.
>>>
>>> The bug is kernel related and resembles very closely another bug seen
>>> about 10 years ago with Caldera's distro's on 2.4.  The bug occurs on
>>> the ia32 beta build.  The bug DOES NOT exist in the 64 bit x86_64
>>> versions.I will attempt to debug it further since I have my own
>>> debugger and try to run it down and provide more info.
>>>
>>
>> A reproducer and/or bisect would help a lot.
>>
>> Thanks.
>>
>

I got to the bottom of this, however, I am not convinced it's a bug in
linux any more than configuring a bridge in a loop is a bug.  I
addressed it to the ISP since this is where the problem lies, even
though Linux contributes routed traffic to the problem.

Jeff
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: sh_eth.c::sh_eth_rx(): mdp->rx_skbuff[entry] can be NULL

2015-10-22 Thread Sergei Shtylyov

Hello.

On 10/21/2015 10:26 AM, Yasushi SHOJI wrote:


Thank your for your reply.


   Not at all, I'm virtually a maintainer for that driver now, so trying to 
filter out the related mails even if I don't have time to read thru all the 
netdev mail.



On 10/19/2015 06:01 PM, Yasushi SHOJI wrote:


I'm not that familiar with this code base so I'm note including any
patch yet.  I appreciate if someone with insight in this code give a
quick look and tell me that it's a real one or not.  if this is a real
case, I can take a deep look.


If you got the oops, it's real. Thanks for the reporting. I guess I
should check the new ravb driver as well...
Do you want to try fixing the bug yourself?



Sure.  I can dive in to this.  I appreciate if someone who has worked
on sh_eth.c give me some design advises or tell me the initial design
thoughts / what was the intention when allocation if failed.


   Hm, well, I seem to have some time to spend on fixing the issues in this 
driver (I noticed a couple while doing the AVB driver), so spending time on 
your "education" would seem somewhat inefficient... :-)



My idea right now is to simply invalidate the descriptor when
netdev_alloc_skb() failed.


   Well, it depends. If you're talking about the second loop in sh_eth_rx(), 
that seems a good idea (and it's what I've done for the dma_mapping_error() 
case in the ravb driver -- I just set the descriptor's data size field to 0). 
The OOM case seems to have been un-addressed in both drivers so far... If we 
take sh_eth_ring_format(), I believe the best course of action is to just fail 
on OOM since the driver doesn't correctly handle that case anyway AFAIR; and 
that was implemented in the ravb driver.



When next packet arrived, in near future,
the driver can try again to allocate the buffer and update the
corresponding descriptor if succeeds.


   It would be too late, unless you still mean the RX refilling loop in this 
function.



If memory is not yet available
when the controller is trying to use the invalid descriptor, the
controller will see it and DMA will stop.


   That means leaving RACT=0 and that's what the driver is even doing...
   Hm, then I don't understand how the error you've described can occur, 
unless we encounter OOM during sh_eth_ring_format()...



Is it acceptable path to go?


   I'm not seeing a bug in this function, perhaps I'm missing something?


Here is how I understand this driver:


[...]


The driver utilizes array of sk_buffs for tx and rx.  For rx, the
driver has an array of pointers of sk_buffs, rx_skbuff[]. This
rx_skbuff[] is filled with sk_buffs in sh_eth_ring_format() which is
called when the driver is open()ed.

The controller, the driver is targeted to, is GETHER.


   Well, it depends on your SoC, it may be 100 Mbps Ether.


A receive descriptor corresponds to one sk_buff.  The controller
expects array of descriptors in the system memory and treat it as a
ring, meaning that the controller process each descriptor one by one.
Once the controller finished the last descriptor, it will go back to
the first one.


   Yes, it seems a correct description.


To achieve zero copy, the driver push the sk_buffs filled with
received packet to the netdev core with netif_receive_skb() then
netdev_alloc_skb() sk_buffs in the sh_eth_rx(), the poll method of the
driver, and update the corresponding descriptor.



If the allocation failed, it just leave the function, leaving old
pointer in the descriptor as is.


   Yes, but note that it also leaves RACT=0, which basically means an invalid 
descriptor, encountering which the reception should just stop.



In some future, the controller will
access the descriptor and writes to the old memory address. (I haven't
checked the state of bits in the descriptor yet)


  Check it.


BTW, if any one has a bit of time, I have questions regarding to the
atomic allocation:


   Sorry, I'm constantly short of time. Someone else will have to answer 
that. :-)


MBR, Sergei

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net v2] openvswitch: Fix egress tunnel info.

2015-10-22 Thread kbuild test robot
Hi Pravin,

[auto build test WARNING on net/master -- if it's inappropriate base, please 
suggest rules for selecting the more suitable base]

url:
https://github.com/0day-ci/linux/commits/Pravin-B-Shelar/openvswitch-Fix-egress-tunnel-info/20151023-053247
config: x86_64-randconfig-x010-10211707 (attached as .config)
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64 

All warnings (new ones prefixed by >>):

   net/openvswitch/actions.c: In function 'output_userspace':
>> net/openvswitch/actions.c:771:24: warning: unused variable 'info' 
>> [-Wunused-variable]
 struct ip_tunnel_info info;
   ^

vim +/info +771 net/openvswitch/actions.c

7f8a436e Joe Stringer 2015-08-26  755   
ethertype = vlan_get_protocol(skb);
7f8a436e Joe Stringer 2015-08-26  756   }
7f8a436e Joe Stringer 2015-08-26  757  
7f8a436e Joe Stringer 2015-08-26  758   ovs_fragment(vport, 
skb, mru, ethertype);
7f8a436e Joe Stringer 2015-08-26  759   } else {
7f8a436e Joe Stringer 2015-08-26  760   kfree_skb(skb);
7f8a436e Joe Stringer 2015-08-26  761   }
7f8a436e Joe Stringer 2015-08-26  762   } else {
738967b8 Andy Zhou2014-09-08  763   kfree_skb(skb);
ccb1352e Jesse Gross  2011-10-25  764   }
7f8a436e Joe Stringer 2015-08-26  765  }
ccb1352e Jesse Gross  2011-10-25  766  
ccb1352e Jesse Gross  2011-10-25  767  static int output_userspace(struct 
datapath *dp, struct sk_buff *skb,
ccea7445 Neil McKee   2015-05-26  768   struct sw_flow_key 
*key, const struct nlattr *attr,
ccea7445 Neil McKee   2015-05-26  769   const struct nlattr 
*actions, int actions_len)
ccb1352e Jesse Gross  2011-10-25  770  {
1d8fff90 Thomas Graf  2015-07-21 @771   struct ip_tunnel_info info;
ccb1352e Jesse Gross  2011-10-25  772   struct dp_upcall_info upcall;
ccb1352e Jesse Gross  2011-10-25  773   const struct nlattr *a;
ccb1352e Jesse Gross  2011-10-25  774   int rem;
ccb1352e Jesse Gross  2011-10-25  775  
ccea7445 Neil McKee   2015-05-26  776   memset(, 0, sizeof(upcall));
ccb1352e Jesse Gross  2011-10-25  777   upcall.cmd = OVS_PACKET_CMD_ACTION;
7f8a436e Joe Stringer 2015-08-26  778   upcall.mru = OVS_CB(skb)->mru;
ccb1352e Jesse Gross  2011-10-25  779  

:: The code at line 771 was first introduced by commit
:: 1d8fff907342d2339796dbd27ea47d0e76a6a2d0 ip_tunnel: Make ovs_tunnel_info 
and ovs_key_ipv4_tunnel generic

:: TO: Thomas Graf 
:: CC: David S. Miller 

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: Binary data


Re: [PATCH 3/3] net: ethernet: add driver for Aurora VLSI NB8800 Ethernet controller

2015-10-22 Thread kbuild test robot
Hi Mans,

[auto build test WARNING on net/master -- if it's inappropriate base, please 
suggest rules for selecting the more suitable base]

url:
https://github.com/0day-ci/linux/commits/Mans-Rullgard/devicetree-add-vendor-prefix-for-Aurora-VLSI/20151022-220753
config: sparc64-allyesconfig (attached as .config)
reproduce:
wget 
https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
 -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=sparc64 

All warnings (new ones prefixed by >>):

   drivers/net/ethernet/aurora/nb8800.c: In function 'nb8800_xmit':
   drivers/net/ethernet/aurora/nb8800.c:381:14: warning: cast from pointer to 
integer of different size [-Wpointer-to-int-cast]
 cpsz = (8 - (u32)skb->data) & 7;
 ^
   drivers/net/ethernet/aurora/nb8800.c: In function 'nb8800_probe':
>> drivers/net/ethernet/aurora/nb8800.c:1006:2: warning: format '%x' expects 
>> argument of type 'unsigned int', but argument 3 has type 'resource_size_t' 
>> [-Wformat=]
 dev_info(>dev, "AU-NB8800 Ethernet at 0x%x\n", res->start);
 ^

vim +1006 drivers/net/ethernet/aurora/nb8800.c

   990  res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
   991  if (!res) {
   992  dev_err(>dev, "No MMIO base\n");
   993  return -EINVAL;
   994  }
   995  
   996  irq = platform_get_irq(pdev, 0);
   997  if (irq <= 0) {
   998  dev_err(>dev, "No IRQ\n");
   999  return -EINVAL;
  1000  }
  1001  
  1002  base = devm_ioremap_resource(>dev, res);
  1003  if (IS_ERR(base))
  1004  return PTR_ERR(base);
  1005  
> 1006  dev_info(>dev, "AU-NB8800 Ethernet at 0x%x\n", 
> res->start);
  1007  
  1008  dev = alloc_etherdev(sizeof(*priv));
  1009  if (!dev)
  1010  return -ENOMEM;
  1011  
  1012  platform_set_drvdata(pdev, dev);
  1013  SET_NETDEV_DEV(dev, >dev);
  1014  

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: Binary data


Re: [PATCH v6] can: xilinx: Convert to runtime_pm

2015-10-22 Thread Sören Brinkmann
Hi Kedar,

On Thu, 2015-10-22 at 10:15AM +0530, Kedareswara rao Appana wrote:
> Instead of enabling/disabling clocks at several locations in the driver,
> Use the runtime_pm framework. This consolidates the actions for runtime PM
> In the appropriate callbacks and makes the driver more readable and 
> mantainable.
> 
> Signed-off-by: Kedareswara rao Appana 
[...]
>  /**
>   * xcan_probe - Platform registration call
> @@ -1072,7 +1103,7 @@ static int xcan_probe(struct platform_device *pdev)
>   return -ENOMEM;
>  
>   priv = netdev_priv(ndev);
> - priv->dev = ndev;
> + priv->dev = >dev;
>   priv->can.bittiming_const = _bittiming_const;
>   priv->can.do_set_mode = xcan_do_set_mode;
>   priv->can.do_get_berr_counter = xcan_get_berr_counter;
> @@ -1114,21 +1145,30 @@ static int xcan_probe(struct platform_device *pdev)
>   }
>   }
>  
> - ret = clk_prepare_enable(priv->can_clk);
> + ret = clk_prepare(priv->can_clk);
>   if (ret) {
>   dev_err(>dev, "unable to enable device clock\n");
>   goto err_free;
>   }
>  
> - ret = clk_prepare_enable(priv->bus_clk);
> + ret = clk_prepare(priv->bus_clk);

Are these clk_prepare calls needed here? The runtime PM calls do
clk_prepare_enable and clk_disable_unprepare.

>   if (ret) {
>   dev_err(>dev, "unable to enable bus clock\n");
> - goto err_unprepare_disable_dev;
> + goto err_unprepare_dev;
>   }
>  
>   priv->write_reg = xcan_write_reg_le;
>   priv->read_reg = xcan_read_reg_le;
>  
> + pm_runtime_irq_safe(>dev);
> + pm_runtime_enable(>dev);
> + ret = pm_runtime_get_sync(>dev);
> + if (ret < 0) {
> + netdev_err(ndev, "%s: pm_runtime_get failed(%d)\n",
> + __func__, ret);
> + goto err_unprepare_busclk;
> + }
> +
>   if (priv->read_reg(priv, XCAN_SR_OFFSET) != XCAN_SR_CONFIG_MASK) {
>   priv->write_reg = xcan_write_reg_be;
>   priv->read_reg = xcan_read_reg_be;
> @@ -1141,22 +1181,26 @@ static int xcan_probe(struct platform_device *pdev)
>   ret = register_candev(ndev);
>   if (ret) {
>   dev_err(>dev, "fail to register failed (err=%d)\n", ret);
> - goto err_unprepare_disable_busclk;
> + goto err_disableclks;
>   }
>  
>   devm_can_led_init(ndev);
> - clk_disable_unprepare(priv->bus_clk);
> - clk_disable_unprepare(priv->can_clk);
> +
> + pm_runtime_put(>dev);
> +
>   netdev_dbg(ndev, "reg_base=0x%p irq=%d clock=%d, tx fifo depth:%d\n",
>   priv->reg_base, ndev->irq, priv->can.clock.freq,
>   priv->tx_max);
>  
>   return 0;
>  
> -err_unprepare_disable_busclk:
> - clk_disable_unprepare(priv->bus_clk);
> -err_unprepare_disable_dev:
> - clk_disable_unprepare(priv->can_clk);
> +err_disableclks:
> + pm_runtime_put(priv->dev);
> +err_unprepare_busclk:
> + pm_runtime_disable(>dev);
> + clk_unprepare(priv->bus_clk);
> +err_unprepare_dev:
> + clk_unprepare(priv->can_clk);
>  err_free:
>   free_candev(ndev);
>  err:
> @@ -1175,11 +1219,11 @@ static int xcan_remove(struct platform_device *pdev)
>   struct net_device *ndev = platform_get_drvdata(pdev);
>   struct xcan_priv *priv = netdev_priv(ndev);
>  
> - if (set_reset_mode(ndev) < 0)
> - netdev_err(ndev, "mode resetting failed!\n");
> -
>   unregister_candev(ndev);
> + pm_runtime_disable(>dev);
>   netif_napi_del(>napi);
> + clk_unprepare(priv->bus_clk);
> + clk_unprepare(priv->can_clk);

I think this can go away when the prepare calls in probe go away.

Thanks,
Sören
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fw: [Bug 106241] New: shutdown(3)/close(3) behaviour is incorrect for sockets in accept(3)

2015-10-22 Thread Alan Burlison

On 22/10/15 19:16, Al Viro wrote:


Don't you have to do that anyway, to do anything useful with the file?


Dirtying the cacheline that contains struct file itself is different, but
that's not per-descriptor.


Yes, true enough.


Usually it's per-process, but any thread could ask for a private instance
to work with (and then spawn more threads sharing that instance - or getting
independent copies).


[snip]

Thanks again for the info, interesting. Solaris also does things along 
the same lines. In fact we recently extended posix_spawn so it could be 
used by the JVM to create subprocesses without wholescale copying, and 
Casper has done some optimisation work on posix_spawn as well.



I don't think that it's possible to claim that a non-atomic dup2()
is POSIX-compliant.


Except that it's in non-normative part of dup2(2), AFAICS.  I certainly
agree that it would be a standard lawyering beyond reason, but "not
possible to claim" is too optimistic.  Maybe I'm just more cynical...


Possibly so, and possibly justifiably so as well ;-)


ThreadA remains sat in accept on fd1 which is now a plain file, not
a socket.


No.  accept() is not an operation on file descriptors; it's an operation on
file descriptions (pardon for use of that terminology).  They are specified
by passing descriptors, but there's a hell of a difference between e.g.
dup() or fcntl(,F_SETFD,) (operations on descriptors) and read() or lseek()
(operations on descriptions).

Lookups are done once per syscall; the only exception is F_SETFL{,W}, where
we recheck that descriptor is refering to the same thing before granting
the lock.


Yes, but if you believe that dup2() requires an implicit close() within 
it and that dup2() has to be atomic then expecting that other threads 
waiting on the same fd in accept() will get a notification seems 
reasonable enough.



Again, POSIX is still underspecifying the semantics of shared descriptor
tables; back when the bulk of it had been written there had been no way
to have a descriptor -> description mapping changed under a syscall by
action of another thread.  Hell, they still hadn't picked on some things
that happened in early 80s, let alone early-to-mid 90s...


That's indisputably true - much of the POSIX behaviour predates threads 
and it shows, quite badly, in some places.



Linux and Solaris happen to cover these gaps differently; FreeBSD and
OpenBSD are probably closer to Linux variant, NetBSD - to Solaris one.


Yes, true enough.

--
Alan Burlison
--
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net v2] openvswitch: Fix egress tunnel info.

2015-10-22 Thread Pravin B Shelar
While transitioning to netdev based vport we broke OVS
feature which allows user to retrieve tunnel packet egress
information for lwtunnel devices.  Following patch fixes it
by introducing ndo operation to get the tunnel egress info.
Same ndo operation can be used for lwtunnel devices and compat
ovs-tnl-vport devices. So after adding such device operation
we can remove similar operation from ovs-vport.

Fixes: 614732eaa12d ("openvswitch: Use regular VXLAN net_device device").
Signed-off-by: Pravin B Shelar 
---
v1-v2:
- changed ndo operation name to ndo_fill_metadata_dst()
- Fix geneve stats update
---
 drivers/net/geneve.c   |   40 ++-
 drivers/net/vxlan.c|   41 
 include/linux/netdevice.h  |7 +
 include/net/dst_metadata.h |   32 ++
 net/core/dev.c |   27 ++
 net/ipv4/ip_gre.c  |   46 +--
 net/openvswitch/actions.c  |8 ++---
 net/openvswitch/datapath.c |5 +--
 net/openvswitch/datapath.h |1 -
 net/openvswitch/flow_netlink.c |   18 +---
 net/openvswitch/flow_netlink.h |6 ++--
 net/openvswitch/vport-geneve.c |   13 -
 net/openvswitch/vport-gre.c|8 -
 net/openvswitch/vport-vxlan.c  |   19 -
 net/openvswitch/vport.c|   58 
 net/openvswitch/vport.h|   35 
 16 files changed, 192 insertions(+), 172 deletions(-)

diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c
index cde29f8..445071c 100644
--- a/drivers/net/geneve.c
+++ b/drivers/net/geneve.c
@@ -594,14 +594,12 @@ static struct rtable *geneve_get_rt(struct sk_buff *skb,
rt = ip_route_output_key(geneve->net, fl4);
if (IS_ERR(rt)) {
netdev_dbg(dev, "no route to %pI4\n", >daddr);
-   dev->stats.tx_carrier_errors++;
-   return rt;
+   return ERR_PTR(-ENETUNREACH);
}
if (rt->dst.dev == dev) { /* is this necessary? */
netdev_dbg(dev, "circular route to %pI4\n", >daddr);
-   dev->stats.collisions++;
ip_rt_put(rt);
-   return ERR_PTR(-EINVAL);
+   return ERR_PTR(-ELOOP);
}
return rt;
 }
@@ -627,12 +625,12 @@ static netdev_tx_t geneve_xmit(struct sk_buff *skb, 
struct net_device *dev)
struct ip_tunnel_info *info = NULL;
struct rtable *rt = NULL;
const struct iphdr *iip; /* interior IP header */
+   int err = -EINVAL;
struct flowi4 fl4;
__u8 tos, ttl;
__be16 sport;
bool udp_csum;
__be16 df;
-   int err;
 
if (geneve->collect_md) {
info = skb_tunnel_info(skb);
@@ -647,7 +645,7 @@ static netdev_tx_t geneve_xmit(struct sk_buff *skb, struct 
net_device *dev)
rt = geneve_get_rt(skb, dev, , info);
if (IS_ERR(rt)) {
netdev_dbg(dev, "no route to %pI4\n", );
-   dev->stats.tx_carrier_errors++;
+   err = PTR_ERR(rt);
goto tx_error;
}
 
@@ -699,10 +697,37 @@ static netdev_tx_t geneve_xmit(struct sk_buff *skb, 
struct net_device *dev)
 tx_error:
dev_kfree_skb(skb);
 err:
-   dev->stats.tx_errors++;
+   if (err == -ELOOP)
+   dev->stats.collisions++;
+   else if (err == -ENETUNREACH)
+   dev->stats.tx_carrier_errors++;
+   else
+   dev->stats.tx_errors++;
return NETDEV_TX_OK;
 }
 
+static int geneve_fill_metadata_dst(struct net_device *dev, struct sk_buff 
*skb)
+{
+   struct ip_tunnel_info *info = skb_tunnel_info(skb);
+   struct geneve_dev *geneve = netdev_priv(dev);
+   struct rtable *rt;
+   struct flowi4 fl4;
+
+   if (ip_tunnel_info_af(info) != AF_INET)
+   return -EINVAL;
+
+   rt = geneve_get_rt(skb, dev, , info);
+   if (IS_ERR(rt))
+   return PTR_ERR(rt);
+
+   ip_rt_put(rt);
+   info->key.u.ipv4.src = fl4.saddr;
+   info->key.tp_src = udp_flow_src_port(geneve->net, skb,
+1, USHRT_MAX, true);
+   info->key.tp_dst = geneve->dst_port;
+   return 0;
+}
+
 static const struct net_device_ops geneve_netdev_ops = {
.ndo_init   = geneve_init,
.ndo_uninit = geneve_uninit,
@@ -713,6 +738,7 @@ static const struct net_device_ops geneve_netdev_ops = {
.ndo_change_mtu = eth_change_mtu,
.ndo_validate_addr  = eth_validate_addr,
.ndo_set_mac_address= eth_mac_addr,
+   .ndo_fill_metadata_dst  = geneve_fill_metadata_dst,
 };
 
 static void geneve_get_drvinfo(struct net_device *dev,
diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index afdc65f..c1587ec 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -2337,6 +2337,46 @@ 

Re: CONFIG_XPS depends on L1_CACHE_BYTES being greater than sizeof(struct xps_map)

2015-10-22 Thread Tom Herbert
On Thu, Oct 22, 2015 at 1:00 PM, Helge Deller  wrote:
> Hi Tom & David,
>
> I've queued-up a patch for the parisc architecture which reduces 
> L1_CACHE_BYTES from 32 to 16:
>   https://patchwork.kernel.org/patch/7399291/
>
> But this change will break the kernel build like this:
>
> In file included from net/core/dev.c:92:0:
> net/core/dev.c: In function ‘expand_xps_map’:
> include/linux/netdevice.h:721:27: warning: overflow in implicit constant 
> conversion [-Woverflow]
>#define XPS_MIN_MAP_ALLOC ((L1_CACHE_BYTES - sizeof(struct xps_map)) \
> net/core/dev.c:1972:18: note: in expansion of macro ‘XPS_MIN_MAP_ALLOC’
>int alloc_len = XPS_MIN_MAP_ALLOC;
>
> Do you see an easy way to fix this ?
>
How about

 #define XPS_MIN_MAP_ALLOC ((L1_CACHE_BYTES - ((sizeof(struct xps_map)
% L1_CACHE_BYTES)) \

Tom

> Thanks,
> Helge
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: CONFIG_XPS depends on L1_CACHE_BYTES being greater than sizeof(struct xps_map)

2015-10-22 Thread Eric Dumazet
On Thu, 2015-10-22 at 22:00 +0200, Helge Deller wrote:
> Hi Tom & David,
> 
> I've queued-up a patch for the parisc architecture which reduces 
> L1_CACHE_BYTES from 32 to 16:
>   https://patchwork.kernel.org/patch/7399291/
> 
> But this change will break the kernel build like this:
> 
> In file included from net/core/dev.c:92:0:
> net/core/dev.c: In function ‘expand_xps_map’:
> include/linux/netdevice.h:721:27: warning: overflow in implicit constant 
> conversion [-Woverflow]
>#define XPS_MIN_MAP_ALLOC ((L1_CACHE_BYTES - sizeof(struct xps_map)) \
> net/core/dev.c:1972:18: note: in expansion of macro ‘XPS_MIN_MAP_ALLOC’
>int alloc_len = XPS_MIN_MAP_ALLOC;
> 
> Do you see an easy way to fix this ?


Using L2_CACHE_BYTES would be better, but it unfortunately does not
exist.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bug 106241] New: shutdown(3)/close(3) behaviour is incorrect for sockets in accept(3)

2015-10-22 Thread Al Viro
On Thu, Oct 22, 2015 at 09:51:05PM +0200, casper@oracle.com wrote:
> 
> >On Thu, Oct 22, 2015 at 08:24:51PM +0200, casper@oracle.com wrote:
> >
> >> The external behaviour atomic; you cannot distinguish the order
> >> between the closing of the original file (and waking up other threads
> >> waiting for a record lock) or changing the file referenced by that newfd.
> >> 
> >> But this not include a global or process specific lock.
> >
> >Interesting...  Do you mean that decriptor-to-file lookup blocks until that
> >rundown finishes?
> 
> For that particular file descriptor, yes.  (I'm assuming you mean the 
> Solaris kernel running down all lwps who have a system in progress on that 
> particular file descriptor).  All other fd to file lookups are not blocked 
> at all by this locking.
> 
> It should be clear that any such occurrences are application errors and 
> should be hardly ever seen in practice.  It is also known when this is 
> needed so it is hardly even attempted.

Ho-hum...  It could even be made lockless in fast path; the problems I see
are
* descriptor-to-file lookup becomes unsafe in a lot of locking
conditions.  Sure, most of that happens on the entry to some syscall, with
very light locking environment, but... auditing every sodding ioctl that
might be doing such lookups is an interesting exercise, and then there are
->mount() instances doing the same thing.  And procfs accesses.  Probably
nothing impossible to deal with, but nothing pleasant either.
* memory footprint.  In case of Linux on amd64 or sparc64,
main()
{
int i;
for (i = 0; i < 1<<24; dup2(0, i++))// 16M descriptors
;
}
will chew 132Mb of kernel data (16Mpointer + 32Mbit, assuming sufficient ulimit 
-n,
of course).  How much will Solaris eat on the same?
* related to the above - how much cacheline sharing will that involve?
These per-descriptor use counts are bitch to pack, and giving each a cacheline
of its own...  
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next] net: dsa: mv88e6xxx: remove debugfs interface

2015-10-22 Thread Vivien Didelot
It is preferable to have a common debugfs interface for DSA or switchdev
instead of a driver specific one. Thus remove the mv88e6xxx debug code.

Signed-off-by: Vivien Didelot 
---
 drivers/net/dsa/mv88e6xxx.c | 291 
 drivers/net/dsa/mv88e6xxx.h |   2 -
 2 files changed, 293 deletions(-)

diff --git a/drivers/net/dsa/mv88e6xxx.c b/drivers/net/dsa/mv88e6xxx.c
index 39664d8..7af9630 100644
--- a/drivers/net/dsa/mv88e6xxx.c
+++ b/drivers/net/dsa/mv88e6xxx.c
@@ -11,7 +11,6 @@
  * (at your option) any later version.
  */
 
-#include 
 #include 
 #include 
 #include 
@@ -21,7 +20,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include "mv88e6xxx.h"
@@ -876,13 +874,6 @@ static int _mv88e6xxx_atu_wait(struct dsa_switch *ds)
   GLOBAL_ATU_OP_BUSY);
 }
 
-/* Must be called with SMI lock held */
-static int _mv88e6xxx_scratch_wait(struct dsa_switch *ds)
-{
-   return _mv88e6xxx_wait(ds, REG_GLOBAL2, GLOBAL2_SCRATCH_MISC,
-  GLOBAL2_SCRATCH_BUSY);
-}
-
 /* Must be called with SMI mutex held */
 static int _mv88e6xxx_phy_read_indirect(struct dsa_switch *ds, int addr,
int regnum)
@@ -2107,273 +2098,9 @@ int mv88e6xxx_setup_ports(struct dsa_switch *ds)
return 0;
 }
 
-static int mv88e6xxx_regs_show(struct seq_file *s, void *p)
-{
-   struct dsa_switch *ds = s->private;
-
-   struct mv88e6xxx_priv_state *ps = ds_to_priv(ds);
-   int reg, port;
-
-   seq_puts(s, "GLOBAL GLOBAL2 ");
-   for (port = 0 ; port < ps->num_ports; port++)
-   seq_printf(s, " %2d  ", port);
-   seq_puts(s, "\n");
-
-   for (reg = 0; reg < 32; reg++) {
-   seq_printf(s, "%2x: ", reg);
-   seq_printf(s, " %4x%4x  ",
-  mv88e6xxx_reg_read(ds, REG_GLOBAL, reg),
-  mv88e6xxx_reg_read(ds, REG_GLOBAL2, reg));
-
-   for (port = 0 ; port < ps->num_ports; port++)
-   seq_printf(s, "%4x ",
-  mv88e6xxx_reg_read(ds, REG_PORT(port), reg));
-   seq_puts(s, "\n");
-   }
-
-   return 0;
-}
-
-static int mv88e6xxx_regs_open(struct inode *inode, struct file *file)
-{
-   return single_open(file, mv88e6xxx_regs_show, inode->i_private);
-}
-
-static const struct file_operations mv88e6xxx_regs_fops = {
-   .open   = mv88e6xxx_regs_open,
-   .read   = seq_read,
-   .llseek = no_llseek,
-   .release = single_release,
-   .owner  = THIS_MODULE,
-};
-
-static void mv88e6xxx_atu_show_header(struct seq_file *s)
-{
-   seq_puts(s, "DB   T/P  Vec State Addr\n");
-}
-
-static void mv88e6xxx_atu_show_entry(struct seq_file *s, int dbnum,
-unsigned char *addr, int data)
-{
-   bool trunk = !!(data & GLOBAL_ATU_DATA_TRUNK);
-   int portvec = ((data & GLOBAL_ATU_DATA_PORT_VECTOR_MASK) >>
-  GLOBAL_ATU_DATA_PORT_VECTOR_SHIFT);
-   int state = data & GLOBAL_ATU_DATA_STATE_MASK;
-
-   seq_printf(s, "%03x %5s %10pb   %x   %pM\n",
-  dbnum, (trunk ? "Trunk" : "Port"), , state, addr);
-}
-
-static int mv88e6xxx_atu_show_db(struct seq_file *s, struct dsa_switch *ds,
-int dbnum)
-{
-   unsigned char bcast[] = { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff};
-   unsigned char addr[6];
-   int ret, data, state;
-
-   ret = _mv88e6xxx_atu_mac_write(ds, bcast);
-   if (ret < 0)
-   return ret;
-
-   do {
-   ret = _mv88e6xxx_reg_write(ds, REG_GLOBAL, GLOBAL_ATU_FID,
-  dbnum);
-   if (ret < 0)
-   return ret;
-
-   ret = _mv88e6xxx_atu_cmd(ds, GLOBAL_ATU_OP_GET_NEXT_DB);
-   if (ret < 0)
-   return ret;
-
-   data = _mv88e6xxx_reg_read(ds, REG_GLOBAL, GLOBAL_ATU_DATA);
-   if (data < 0)
-   return data;
-
-   state = data & GLOBAL_ATU_DATA_STATE_MASK;
-   if (state == GLOBAL_ATU_DATA_STATE_UNUSED)
-   break;
-   ret = _mv88e6xxx_atu_mac_read(ds, addr);
-   if (ret < 0)
-   return ret;
-   mv88e6xxx_atu_show_entry(s, dbnum, addr, data);
-   } while (state != GLOBAL_ATU_DATA_STATE_UNUSED);
-
-   return 0;
-}
-
-static int mv88e6xxx_atu_show(struct seq_file *s, void *p)
-{
-   struct dsa_switch *ds = s->private;
-   struct mv88e6xxx_priv_state *ps = ds_to_priv(ds);
-   int dbnum;
-
-   mv88e6xxx_atu_show_header(s);
-
-   for (dbnum = 0; dbnum < 255; dbnum++) {
-   mutex_lock(>smi_mutex);
-   mv88e6xxx_atu_show_db(s, ds, dbnum);
-   mutex_unlock(>smi_mutex);
-   }
-
-   return 0;

[PATCH net-next v5 0/2] mpls: multipath support

2015-10-22 Thread Roopa Prabhu
From: Roopa Prabhu 

This patch adds support for MPLS multipath routes.

Includes following changes to support multipath:
- splits struct mpls_route into 'struct mpls_route + struct mpls_nh'.

- struct mpls_nh represents a mpls nexthop label forwarding entry

- Adds support to parse/fill RTA_MULTIPATH netlink attribute for
multipath routes similar to ipv4/v6 fib

- In the process of restructuring, this patch also consistently changes all
labels to u8

$ip -f mpls route add 100 nexthop as 200 via inet 10.1.1.2 dev swp1 \
nexthop as 700 via inet 10.1.1.6 dev swp2 \
nexthop as 800 via inet 40.1.1.2 dev swp3

$ip  -f mpls route show
100 
nexthop as to 200 via inet 10.1.1.2  dev swp1
nexthop as to 700 via inet 10.1.1.6  dev swp2
nexthop as to 800 via inet 40.1.1.2  dev swp3

Roopa Prabhu (1):
  mpls: multipath support

Robert Shearman (1):
  mpls: flow-based multipath selection

Signed-off-by: Roopa Prabhu 


v2:
- Incorporate some feedback from Robert:
use dynamic allocation (list) instead of static allocation
for nexthops
v3:
- Move back to arrays (same as v1), also suggested by Eric Biederman

v4:
- address a few comments from Eric Biederman
Plan to address the following pending comments in incremental patches 
after this
infrastructure changes go in.
- Move VIA size to 16 bytes
- use ipv6 flow label in ecmp calculations
- dead route handling during multipath route selection (I had planned 
this in
an incremental patch initially).

v5:
feedback from Eric Biederman
- Removed some dead code
feedback from Robert
- Moved dev_put into find_outdev to make it clear that we dont 
need
a hold on the dev because we are under rtnl
- move the unused variable fix into the correct patch file

 include/net/mpls_iptunnel.h |   2 +-
 net/mpls/af_mpls.c  | 668 ++--
 net/mpls/internal.h |  57 +++-
 3 files changed, 572 insertions(+), 155 deletions(-)

-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next v5 1/2] mpls: multipath route support

2015-10-22 Thread Roopa Prabhu
From: Roopa Prabhu 

This patch adds support for MPLS multipath routes.

Includes following changes to support multipath:
- splits struct mpls_route into 'struct mpls_route + struct mpls_nh'

- 'struct mpls_nh' represents a mpls nexthop label forwarding entry

- moves mpls route and nexthop structures into internal.h

- A mpls_route can point to multiple mpls_nh structs

- the nexthops are maintained as a array (similar to ipv4 fib)

- In the process of restructuring, this patch also consistently changes
  all labels to u8

- Adds support to parse/fill RTA_MULTIPATH netlink attribute for
multipath routes similar to ipv4/v6 fib

- In this patch, the multipath route nexthop selection algorithm
simply returns the first nexthop. It is replaced by a
hash based algorithm from Robert Shearman in the next patch

- mpls_route_update cleanup: remove 'dev' handling in mpls_route_update.
mpls_route_update though implemented to update based on dev, it was
never used that way. And the dev handling gets tricky with multiple nexthops.
Cannot match against any single nexthops dev. So, this patch removes the unused
'dev' handling in mpls_route_update.

Example:

$ip -f mpls route add 100 nexthop as 200 via inet 10.1.1.2 dev swp1 \
nexthop as 700 via inet 10.1.1.6 dev swp2 \
nexthop as 800 via inet 40.1.1.2 dev swp3

$ip  -f mpls route show
100
nexthop as to 200 via inet 10.1.1.2  dev swp1
nexthop as to 700 via inet 10.1.1.6  dev swp2
nexthop as to 800 via inet 40.1.1.2  dev swp3

Signed-off-by: Roopa Prabhu 
---
 include/net/mpls_iptunnel.h |   2 +-
 net/mpls/af_mpls.c  | 493 +++-
 net/mpls/internal.h |  52 -
 3 files changed, 398 insertions(+), 149 deletions(-)

diff --git a/include/net/mpls_iptunnel.h b/include/net/mpls_iptunnel.h
index 4757997..179253f 100644
--- a/include/net/mpls_iptunnel.h
+++ b/include/net/mpls_iptunnel.h
@@ -18,7 +18,7 @@
 
 struct mpls_iptunnel_encap {
u32 label[MAX_NEW_LABELS];
-   u32 labels;
+   u8  labels;
 };
 
 static inline struct mpls_iptunnel_encap *mpls_lwtunnel_encap(struct 
lwtunnel_state *lwtstate)
diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
index bb185a2..3f95499 100644
--- a/net/mpls/af_mpls.c
+++ b/net/mpls/af_mpls.c
@@ -19,37 +19,9 @@
 #include 
 #include 
 #endif
+#include 
 #include "internal.h"
 
-#define LABEL_NOT_SPECIFIED (1<<20)
-#define MAX_NEW_LABELS 2
-
-/* This maximum ha length copied from the definition of struct neighbour */
-#define MAX_VIA_ALEN (ALIGN(MAX_ADDR_LEN, sizeof(unsigned long)))
-
-enum mpls_payload_type {
-   MPT_UNSPEC, /* IPv4 or IPv6 */
-   MPT_IPV4 = 4,
-   MPT_IPV6 = 6,
-
-   /* Other types not implemented:
-*  - Pseudo-wire with or without control word (RFC4385)
-*  - GAL (RFC5586)
-*/
-};
-
-struct mpls_route { /* next hop label forwarding entry */
-   struct net_device __rcu *rt_dev;
-   struct rcu_head rt_rcu;
-   u32 rt_label[MAX_NEW_LABELS];
-   u8  rt_protocol; /* routing protocol that set this 
entry */
-   u8  rt_payload_type;
-   u8  rt_labels;
-   u8  rt_via_alen;
-   u8  rt_via_table;
-   u8  rt_via[0];
-};
-
 static int zero = 0;
 static int label_limit = (1 << 20) - 1;
 
@@ -80,10 +52,10 @@ bool mpls_output_possible(const struct net_device *dev)
 }
 EXPORT_SYMBOL_GPL(mpls_output_possible);
 
-static unsigned int mpls_rt_header_size(const struct mpls_route *rt)
+static unsigned int mpls_nh_header_size(const struct mpls_nh *nh)
 {
/* The size of the layer 2.5 labels to be added for this route */
-   return rt->rt_labels * sizeof(struct mpls_shim_hdr);
+   return nh->nh_labels * sizeof(struct mpls_shim_hdr);
 }
 
 unsigned int mpls_dev_mtu(const struct net_device *dev)
@@ -105,6 +77,12 @@ bool mpls_pkt_too_big(const struct sk_buff *skb, unsigned 
int mtu)
 }
 EXPORT_SYMBOL_GPL(mpls_pkt_too_big);
 
+static struct mpls_nh *mpls_select_multipath(struct mpls_route *rt)
+{
+   /* assume single nexthop for now */
+   return >rt_nh[0];
+}
+
 static bool mpls_egress(struct mpls_route *rt, struct sk_buff *skb,
struct mpls_entry_decoded dec)
 {
@@ -159,6 +137,7 @@ static int mpls_forward(struct sk_buff *skb, struct 
net_device *dev,
struct net *net = dev_net(dev);
struct mpls_shim_hdr *hdr;
struct mpls_route *rt;
+   struct mpls_nh *nh;
struct mpls_entry_decoded dec;
struct net_device *out_dev;
struct mpls_dev *mdev;
@@ -196,8 +175,12 @@ static int mpls_forward(struct sk_buff *skb, struct 
net_device *dev,
if (!rt)
goto drop;
 
+   nh = mpls_select_multipath(rt);
+   if (!nh)
+   goto 

[PATCH net-next v5 2/2] mpls: flow-based multipath selection

2015-10-22 Thread Roopa Prabhu
From: Robert Shearman 

Change the selection of a multipath route to use a flow-based
hash. This more suitable for traffic sensitive to reordering within a
flow (e.g. TCP, L2VPN) and whilst still allowing a good distribution
of traffic given enough flows.

Selection of the path for a multipath route is done using a hash of:
1. Label stack up to MAX_MP_SELECT_LABELS labels or up to and
   including entropy label, whichever is first.
2. 3-tuple of (L3 src, L3 dst, proto) from IPv4/IPv6 header in MPLS
   payload, if present.

Naturally, a 5-tuple hash using L4 information in addition would be
possible and be better in some scenarios, but there is a tradeoff
between looking deeper into the packet to achieve good distribution,
and packet forwarding performance, and I have erred on the side of the
latter as the default.

Signed-off-by: Robert Shearman 
---
 net/mpls/af_mpls.c | 87 +++---
 1 file changed, 83 insertions(+), 4 deletions(-)

diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
index 3f95499..c6392aa 100644
--- a/net/mpls/af_mpls.c
+++ b/net/mpls/af_mpls.c
@@ -22,6 +22,11 @@
 #include 
 #include "internal.h"
 
+/* Maximum number of labels to look ahead at when selecting a path of
+ * a multipath route
+ */
+#define MAX_MP_SELECT_LABELS 4
+
 static int zero = 0;
 static int label_limit = (1 << 20) - 1;
 
@@ -77,10 +82,78 @@ bool mpls_pkt_too_big(const struct sk_buff *skb, unsigned 
int mtu)
 }
 EXPORT_SYMBOL_GPL(mpls_pkt_too_big);
 
-static struct mpls_nh *mpls_select_multipath(struct mpls_route *rt)
+static struct mpls_nh *mpls_select_multipath(struct mpls_route *rt,
+struct sk_buff *skb, bool bos)
 {
-   /* assume single nexthop for now */
-   return >rt_nh[0];
+   struct mpls_entry_decoded dec;
+   struct mpls_shim_hdr *hdr;
+   bool eli_seen = false;
+   int label_index;
+   int nh_index = 0;
+   u32 hash = 0;
+
+   /* No need to look further into packet if there's only
+* one path
+*/
+   if (rt->rt_nhn == 1)
+   goto out;
+
+   for (label_index = 0; label_index < MAX_MP_SELECT_LABELS && !bos;
+label_index++) {
+   if (!pskb_may_pull(skb, sizeof(*hdr) * label_index))
+   break;
+
+   /* Read and decode the current label */
+   hdr = mpls_hdr(skb) + label_index;
+   dec = mpls_entry_decode(hdr);
+
+   /* RFC6790 - reserved labels MUST NOT be used as keys
+* for the load-balancing function
+*/
+   if (likely(dec.label >= MPLS_LABEL_FIRST_UNRESERVED)) {
+   hash = jhash_1word(dec.label, hash);
+
+   /* The entropy label follows the entropy label
+* indicator, so this means that the entropy
+* label was just added to the hash - no need to
+* go any deeper either in the label stack or in the
+* payload
+*/
+   if (eli_seen)
+   break;
+   } else if (dec.label == MPLS_LABEL_ENTROPY) {
+   eli_seen = true;
+   }
+
+   bos = dec.bos;
+   if (bos && pskb_may_pull(skb, sizeof(*hdr) * label_index +
+sizeof(struct iphdr))) {
+   const struct iphdr *v4hdr;
+
+   v4hdr = (const struct iphdr *)(mpls_hdr(skb) +
+  label_index);
+   if (v4hdr->version == 4) {
+   hash = jhash_3words(ntohl(v4hdr->saddr),
+   ntohl(v4hdr->daddr),
+   v4hdr->protocol, hash);
+   } else if (v4hdr->version == 6 &&
+   pskb_may_pull(skb, sizeof(*hdr) * label_index +
+ sizeof(struct ipv6hdr))) {
+   const struct ipv6hdr *v6hdr;
+
+   v6hdr = (const struct ipv6hdr *)(mpls_hdr(skb) +
+   label_index);
+
+   hash = __ipv6_addr_jhash(>saddr, hash);
+   hash = __ipv6_addr_jhash(>daddr, hash);
+   hash = jhash_1word(v6hdr->nexthdr, hash);
+   }
+   }
+   }
+
+   nh_index = hash % rt->rt_nhn;
+out:
+   return >rt_nh[nh_index];
 }
 
 static bool mpls_egress(struct mpls_route *rt, struct sk_buff *skb,
@@ -175,7 +248,7 @@ static int mpls_forward(struct sk_buff *skb, struct 
net_device *dev,
if (!rt)
goto drop;
 
- 

[PATCH 2/2] e1000e: Fix msi-x interrupt automask

2015-10-22 Thread Benjamin Poirier
Since the introduction of 82574 support in e1000e, the driver has worked on
the assumption that msi-x interrupt generation is automatically disabled
after each irq. As it turns out, this is not the case. Currently, rx
interrupts can fire multiple times before and during napi processing. This
can be a problem for users because frames that arrive in a certain window
(after adapter->clean_rx() but before napi_complete_done() has cleared
NAPI_STATE_SCHED) generate an interrupt which does not lead to
napi_schedule(). These frames sit in the rx queue until another frame
arrives (a tcp retransmit for example).

While the EIAC and CTRL_EXT registers are properly configured for irq
automask, the modification of IAM in e1000_configure_msix() is what
prevents automask from working as intended.

This patch removes that erroneous write and fixes interrupt rearming for tx
and "other" interrupts. Since e1000_msix_other() reads ICR, all interrupts
must be rearmed in that function.

Reported-by: Frank Steiner 
Signed-off-by: Benjamin Poirier 
---
 drivers/net/ethernet/intel/e1000e/netdev.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c 
b/drivers/net/ethernet/intel/e1000e/netdev.c
index a228167..8881256 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -1921,7 +1921,8 @@ static irqreturn_t e1000_msix_other(int __always_unused 
irq, void *data)
 
 no_link_interrupt:
if (!test_bit(__E1000_DOWN, >state))
-   ew32(IMS, E1000_IMS_LSC | E1000_IMS_OTHER);
+   ew32(IMS, adapter->eiac_mask | E1000_IMS_OTHER |
+E1000_IMS_LSC);
 
return IRQ_HANDLED;
 }
@@ -1940,6 +1941,9 @@ static irqreturn_t e1000_intr_msix_tx(int __always_unused 
irq, void *data)
/* Ring was not completely cleaned, so fire another interrupt */
ew32(ICS, tx_ring->ims_val);
 
+   if (!test_bit(__E1000_DOWN, >state))
+   ew32(IMS, E1000_IMS_TXQ0);
+
return IRQ_HANDLED;
 }
 
@@ -2027,11 +2031,7 @@ static void e1000_configure_msix(struct e1000_adapter 
*adapter)
 
/* enable MSI-X PBA support */
ctrl_ext = er32(CTRL_EXT);
-   ctrl_ext |= E1000_CTRL_EXT_PBA_CLR;
-
-   /* Auto-Mask Other interrupts upon ICR read */
-   ew32(IAM, ~E1000_EIAC_MASK_82574 | E1000_IMS_OTHER);
-   ctrl_ext |= E1000_CTRL_EXT_EIAME;
+   ctrl_ext |= E1000_CTRL_EXT_PBA_CLR | E1000_CTRL_EXT_EIAME;
ew32(CTRL_EXT, ctrl_ext);
e1e_flush();
 }
-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] e1000e: remove unreachable code

2015-10-22 Thread Benjamin Poirier
msi-x interrupts are not shared so there's no need to check if the
interrupt was really from this adapter.

Signed-off-by: Benjamin Poirier 
---
 drivers/net/ethernet/intel/e1000e/netdev.c | 6 --
 1 file changed, 6 deletions(-)

diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c 
b/drivers/net/ethernet/intel/e1000e/netdev.c
index 0a854a4..a228167 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -1907,12 +1907,6 @@ static irqreturn_t e1000_msix_other(int __always_unused 
irq, void *data)
struct e1000_hw *hw = >hw;
u32 icr = er32(ICR);
 
-   if (!(icr & E1000_ICR_INT_ASSERTED)) {
-   if (!test_bit(__E1000_DOWN, >state))
-   ew32(IMS, E1000_IMS_OTHER);
-   return IRQ_NONE;
-   }
-
if (icr & adapter->eiac_mask)
ew32(ICS, (icr & adapter->eiac_mask));
 
-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 net-next] bpf: fix bpf_perf_event_read() helper

2015-10-22 Thread Alexei Starovoitov
Fix safety checks for bpf_perf_event_read():
- only non-inherited events can be added to perf_event_array map
  (do this check statically at map insertion time)
- dynamically check that event is local and !pmu->count
Otherwise buggy bpf program can cause kernel splat.

Also fix error path after perf_event_attrs()
and remove redundant 'extern'.

Fixes: 35578d798400 ("bpf: Implement function bpf_perf_event_read() that get 
the selected hardware PMU conuter")
Signed-off-by: Alexei Starovoitov 
---
v2->v3:
. refactor checks based on Wangnan's and Peter's feedback
while refactoring realized that these two issues need fixes as well:
. fix perf_event_attrs() error path
. remove redundant extern

v1->v2: fix compile in case of !CONFIG_PERF_EVENTS

Even in the worst case the crash is not possible.
Only warn_on_once, so imo net-next is ok.

 include/linux/bpf.h  |1 -
 kernel/bpf/arraymap.c|   25 -
 kernel/trace/bpf_trace.c |7 ++-
 3 files changed, 22 insertions(+), 11 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index e3a51b74e275..75718fa28260 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -194,7 +194,6 @@ extern const struct bpf_func_proto 
bpf_map_lookup_elem_proto;
 extern const struct bpf_func_proto bpf_map_update_elem_proto;
 extern const struct bpf_func_proto bpf_map_delete_elem_proto;
 
-extern const struct bpf_func_proto bpf_perf_event_read_proto;
 extern const struct bpf_func_proto bpf_get_prandom_u32_proto;
 extern const struct bpf_func_proto bpf_get_smp_processor_id_proto;
 extern const struct bpf_func_proto bpf_tail_call_proto;
diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index e3cfe46b074f..3f4c99e06c6b 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -292,16 +292,23 @@ static void *perf_event_fd_array_get_ptr(struct bpf_map 
*map, int fd)
 
attr = perf_event_attrs(event);
if (IS_ERR(attr))
-   return (void *)attr;
+   goto err;
 
-   if (attr->type != PERF_TYPE_RAW &&
-   !(attr->type == PERF_TYPE_SOFTWARE &&
- attr->config == PERF_COUNT_SW_BPF_OUTPUT) &&
-   attr->type != PERF_TYPE_HARDWARE) {
-   perf_event_release_kernel(event);
-   return ERR_PTR(-EINVAL);
-   }
-   return event;
+   if (attr->inherit)
+   goto err;
+
+   if (attr->type == PERF_TYPE_RAW)
+   return event;
+
+   if (attr->type == PERF_TYPE_HARDWARE)
+   return event;
+
+   if (attr->type == PERF_TYPE_SOFTWARE &&
+   attr->config == PERF_COUNT_SW_BPF_OUTPUT)
+   return event;
+err:
+   perf_event_release_kernel(event);
+   return ERR_PTR(-EINVAL);
 }
 
 static void perf_event_fd_array_put_ptr(void *ptr)
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 47febbe7998e..003df3887287 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -199,6 +199,11 @@ static u64 bpf_perf_event_read(u64 r1, u64 index, u64 r3, 
u64 r4, u64 r5)
if (!event)
return -ENOENT;
 
+   /* make sure event is local and doesn't have pmu::count */
+   if (event->oncpu != smp_processor_id() ||
+   event->pmu->count)
+   return -EINVAL;
+
/*
 * we don't know if the function is run successfully by the
 * return value. It can be judged in other places, such as
@@ -207,7 +212,7 @@ static u64 bpf_perf_event_read(u64 r1, u64 index, u64 r3, 
u64 r4, u64 r5)
return perf_event_read_local(event);
 }
 
-const struct bpf_func_proto bpf_perf_event_read_proto = {
+static const struct bpf_func_proto bpf_perf_event_read_proto = {
.func   = bpf_perf_event_read,
.gpl_only   = false,
.ret_type   = RET_INTEGER,
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/1] xen-netfront: limit max queues number to online cpus

2015-10-22 Thread Joe Jin
Should not allocate queues number more than online cpus.

Signed-off-by: Joe Jin 
Cc: Boris Ostrovsky 
Cc: Konrad Rzeszutek Wilk 
Cc: David S. Miller 
---
 drivers/net/xen-netfront.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index f821a97..1eebd4e 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -2137,7 +2137,7 @@ static int __init netif_init(void)
/* Allow as many queues as there are CPUs if user has not
 * specified a value.
 */
-   if (xennet_max_queues == 0)
+   if (xennet_max_queues == 0 || xennet_max_queues > num_online_cpus())
xennet_max_queues = num_online_cpus();
 
return xenbus_register_frontend(_driver);
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[net 2/2] i40e: fix annoying message

2015-10-22 Thread Jeff Kirsher
From: Jesse Brandeburg 

The driver was printing a message about not being able
to assign VMDq because of a lack of MSI-X vectors.

This was because a line was missing that initialized a variable,
simply a merge error.

Signed-off-by: Jesse Brandeburg 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_main.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c 
b/drivers/net/ethernet/intel/i40e/i40e_main.c
index dd44faf..3dd26cd 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -7911,6 +7911,7 @@ static int i40e_sw_init(struct i40e_pf *pf)
if (pf->hw.func_caps.vmdq) {
pf->num_vmdq_vsis = I40E_DEFAULT_NUM_VMDQ_VSI;
pf->flags |= I40E_FLAG_VMDQ_ENABLED;
+   pf->num_vmdq_qps = i40e_default_queues_per_vmdq(pf);
}
 
 #ifdef I40E_FCOE
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


  1   2   3   >