date:20170313

Re: [net-next sample action optimization 3/3] openvswitch: Optimize sample action for the clone use cases

2017-03-13 Thread Pravin Shelar

On Mon, Mar 13, 2017 at 1:14 PM, Andy Zhou  wrote:
> Thanks for the review. Please see comments inline.
>
> On Mon, Mar 13, 2017 at 12:08 AM, Pravin Shelar  wrote:
>> On Fri, Mar 10, 2017 at 4:51 PM, Andy Zhou  wrote:
>>> With the introduction of open flow 'clone' action, the OVS user space
>>> can now translate the 'clone' action into kernel datapath 'sample'
>>> action, with 100% probability, to ensure that the clone semantics,
>>> which is that the packet seen by the clone action is the same as the
>>> packet seen by the action after clone, is faithfully carried out
>>> in the datapath.
>>>
>>> While the sample action in the datpath has the matching semantics,
>>> its implementation is only optimized for its original use.
>>> Specifically, there are two limitation: First, there is a 3 level of
>>> nesting restriction, enforced at the flow downloading time. This
>>> limit turns out to be too restrictive for the 'clone' use case.
>>> Second, the implementation avoid recursive call only if the sample
>>> action list has a single userspace action.
>>>
>>> The main optimization implemented in this series removes the static
>>> nesting limit check, instead, implement the run time recursion limit
>>> check, and recursion avoidance similar to that of the 'recirc' action.
>>> This optimization solve both #1 and #2 issues above.
>>>
>>> One related optimization attemps to avoid copying flow key as
>>> long as the actions enclosed does not change the flow key. The
>>> detection is performed only once at the flow downloading time.
>>>
>>> Another related optimization is to rewrite the action list
>>> at flow downloading time in order to save the fast path from parsing
>>> the sample action list in its original form repeatedly.
>>>
>>> Signed-off-by: Andy Zhou 
>>> ---
>>>  include/uapi/linux/openvswitch.h |  13 
>>>  net/openvswitch/actions.c| 111 ++
>>>  net/openvswitch/datapath.h   |   1 +
>>>  net/openvswitch/flow_netlink.c   | 126 
>>> ---
>>>  4 files changed, 162 insertions(+), 89 deletions(-)
>>>
>> 
>> ...
>>>  static int sample(struct datapath *dp, struct sk_buff *skb,
>>>   struct sw_flow_key *key, const struct nlattr *attr,
>>> - const struct nlattr *actions, int actions_len)
>>> + bool last)
>>>  {
>>> -   const struct nlattr *acts_list = NULL;
>>> -   const struct nlattr *a;
>>> -   int rem;
>>> -   u32 cutlen = 0;
>>> -
>>> -   for (a = nla_data(attr), rem = nla_len(attr); rem > 0;
>>> -a = nla_next(a, )) {
>>> -   u32 probability;
>>> +   struct nlattr *actions;
>>> +   struct nlattr *sample_arg;
>>> +   struct sk_buff *clone_skb;
>>> +   struct sw_flow_key *orig = key;
>>> +   int rem = nla_len(attr);
>>> +   int err = 0;
>>> +   const struct sample_arg *arg;
>>>
>>> -   switch (nla_type(a)) {
>>> -   case OVS_SAMPLE_ATTR_PROBABILITY:
>>> -   probability = nla_get_u32(a);
>>> -   if (!probability || prandom_u32() > probability)
>>> -   return 0;
>>> -   break;
>>> +   /* The first action is always 'OVS_SAMPLE_ATTR_ARG'. */
>>> +   sample_arg = nla_data(attr);
>>> +   arg = nla_data(sample_arg);
>>> +   actions = nla_next(sample_arg, );
>>>
>>> -   case OVS_SAMPLE_ATTR_ACTIONS:
>>> -   acts_list = a;
>>> -   break;
>>> -   }
>>> +   if ((arg->probability != U32_MAX) &&
>>> +   (!arg->probability || prandom_u32() > arg->probability)) {
>>> +   if (last)
>>> +   consume_skb(skb);
>> To simplify let the existing code in do_execute_action() handle
>> freeing skb in this case.
>
> In the 'last' case, this function always consumes skb. In case the
> probability test passes, this is done by not cloning the skb
> before passing the skb to 'do_execute_actions'.  In case the
> probability test does not pass, the skb has to be explicitly freed.
> Currently, the caller does not know which case it is.  Are you
> suggesting that we change the function signature to pass
> this information back to the caller? Or something else?
>>
I see, lets keep this code then.

Re: [PATCH net] tun: fix premature POLLOUT notification on tun devices

2017-03-13 Thread David Miller

From: Hannes Frederic Sowa 
Date: Mon, 13 Mar 2017 00:00:26 +0100

> aszlig observed failing ssh tunnels (-w) during initialization since
> commit cc9da6cc4f56e0 ("ipv6: addrconf: use stable address generator for
> ARPHRD_NONE"). We already had reports that the mentioned commit breaks
> Juniper VPN connections. I can't clearly say that the Juniper VPN client
> has the same problem, but it is worth a try to hint to this patch.
> 
> Because of the early generation of link local addresses, the kernel now
> can start asking for routers on the local subnet much earlier than usual.
> Those router solicitation packets arrive inside the ssh channels and
> should be transmitted to the tun fd before the configuration scripts
> might have upped the interface and made it ready for transmission.
> 
> ssh polls on the interface and receives back a POLL_OUT. It tries to send
> the earily router solicitation packet to the tun interface.  Unfortunately
> it hasn't been up'ed yet by config scripts, thus failing with -EIO. ssh
> doesn't retry again and considers the tun interface broken forever.
> 
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=121131
> Fixes: cc9da6cc4f56 ("ipv6: addrconf: use stable address generator for 
> ARPHRD_NONE")
> Cc: Bjørn Mork 
> Reported-by: Valdis Kletnieks 
> Cc: Valdis Kletnieks 
> Reported-by: Jonas Lippuner 
> Cc: Jonas Lippuner 
> Reported-by: aszlig 
> Cc: aszlig 
> Signed-off-by: Hannes Frederic Sowa 

Applied and queued up for -stable.

Re: [PATCH net] dccp: fix memory leak during tear-down of unsuccessful connection request

2017-03-13 Thread David Miller

From: Hannes Frederic Sowa 
Date: Mon, 13 Mar 2017 00:01:30 +0100

> This patch fixes a memory leak, which happens if the connection request
> is not fulfilled between parsing the DCCP options and handling the SYN
> (because e.g. the backlog is full), because we forgot to free the
> list of ack vectors.
> 
> Reported-by: Jianwen Ji 
> Signed-off-by: Hannes Frederic Sowa 

Applied and queued up for -stable.

Re: [PATCH net-next] mlx4: Better use of order-0 pages in RX path

2017-03-13 Thread Alexei Starovoitov

On Mon, Mar 13, 2017 at 06:02:11PM -0700, Eric Dumazet wrote:
> On Mon, 2017-03-13 at 16:40 -0700, Alexei Starovoitov wrote:
> 
> > that's not how it works. It's a job of submitter to prove
> > that additional code doesn't cause regressions especially
> > when there are legitimate concerns.
> 
> This test was moved out of the mlx4_en_prepare_rx_desc() section into
> the XDP_TX code path.
> 
> 
> if (ring->page_cache.index > 0) {
> /* XDP uses a single page per frame */
> if (!frags->page) {
> ring->page_cache.index--;
> frags->page = 
> ring->page_cache.buf[ring->page_cache.index].page;
> frags->dma  = 
> ring->page_cache.buf[ring->page_cache.index].dma;
> }
> frags->page_offset = XDP_PACKET_HEADROOM;
> rx_desc->data[0].addr = cpu_to_be64(frags->dma +
> XDP_PACKET_HEADROOM);
> return 0;
> }
> 
> Can you check again your claim, because I see no additional cost
> for XDP_TX.

Let's look what it was:
- xdp_tx xmits the page regardless whether driver can replenish
- at the end of the napi mlx4_en_refill_rx_buffers() will replenish
rx in bulk either from page_cache or by allocating one page at a time

after the changes:
- xdp_tx will check page_cache if it's empty it will try to do
order 10 (ten!!) alloc, will fail, will try to alloc single page,
will xmit the packet, and will place just allocated page into rx ring.
on the next packet in the same napi loop, it will try to allocate
order 9 (since the cache is still empty), will fail, will try single
page, succeed... next packet will try order 8 and so on.
And that spiky order 10 allocations will be happening 4 times a second
due to new logic in mlx4_en_recover_from_oom().
We may get lucky and order 2 alloc will succeed, but before that
we will waste tons of cycles.
If an attacker somehow makes existing page recycling logic not effective,
the xdp performance will be limited by order0 page allocator.
Not great, but still acceptable.
After this patch it will just tank due to this crazy scheme.
Yet you're not talking about this new behavior in the commit log.
You didn't test XDP at all and still claiming that everything is fine ?!
NACK

Re: [PATCH net, v2] dccp/tcp: fix routing redirect race

2017-03-13 Thread David Miller

From: Jon Maxwell 
Date: Fri, 10 Mar 2017 16:40:33 +1100

> As Eric Dumazet pointed out this also needs to be fixed in IPv6.
> v2: Contains the IPv6 tcp/Ipv6 dccp patches as well.
> 
> We have seen a few incidents lately where a dst_enty has been freed
> with a dangling TCP socket reference (sk->sk_dst_cache) pointing to that
> dst_entry. If the conditions/timings are right a crash then ensues when the
> freed dst_entry is referenced later on. A Common crashing back trace is:
 ...
> Of course it may happen with other NIC drivers as well.
> 
> It's found the freed dst_entry here:
 ...
> But there are other backtraces attributed to the same freed dst_entry in
> netfilter code as well.
> 
> All the vmcores showed 2 significant clues:
> 
> - Remote hosts behind the default gateway had always been redirected to a
> different gateway. A rtable/dst_entry will be added for that host. Making
> more dst_entrys with lower reference counts. Making this more probable.
> 
> - All vmcores showed a postitive LockDroppedIcmps value, e.g:
> 
> LockDroppedIcmps  267
> 
> A closer look at the tcp_v4_err() handler revealed that do_redirect() will run
> regardless of whether user space has the socket locked. This can result in a
> race condition where the same dst_entry cached in sk->sk_dst_entry can be
> decremented twice for the same socket via:
> 
> do_redirect()->__sk_dst_check()-> dst_release().
> 
> Which leads to the dst_entry being prematurely freed with another socket
> pointing to it via sk->sk_dst_cache and a subsequent crash.
> 
> To fix this skip do_redirect() if usespace has the socket locked. Instead let
> the redirect take place later when user space does not have the socket
> locked.
> 
> The dccp/IPv6 code is very similar in this respect, so fixing it there too.
> 
> As Eric Garver pointed out the following commit now invalidates routes. Which
> can set the dst->obsolete flag so that ipv4_dst_check() returns null and
> triggers the dst_release().
> 
> Fixes: ceb3320610d6 ("ipv4: Kill routes during PMTU/redirect updates.")
> Cc: Eric Garver 
> Cc: Hannes Sowa 
> Signed-off-by: Jon Maxwell 

Applied and queued up for -stable, thank you.

Re: [PATCH] ucc/hdlc: fix two little issue

2017-03-13 Thread David Miller

From: Zhao Qiang 
Date: Tue, 14 Mar 2017 09:38:33 +0800

> 1. modify bd_status from u32 to u16 in function hdlc_rx_done,
> because bd_status register is 16bits
> 2. write bd_length register before writing bd_status register
> 
> Signed-off-by: Zhao Qiang 

Applied, thank you.

Re: [PATCH net-next 00/12] net: bcmgenet: add support for GENETv5

2017-03-13 Thread David Miller

From: Doug Berger 
Date: Mon, 13 Mar 2017 17:41:30 -0700

> This collection of patches contains changes related to adding
> support for the BCM7260, BCM7268, and BCM7271 devices that
> contain a new version of the GENET MAC IP block (v5) and a new
> fast ethernet (10/100BASE-T) internal PHY.
> 
> These patches were originally developed on top of the bug fixes
> of the "[PATCH v2 net 0/8] net: bcmgenet: minor bug fixes" patch
> set previously accepted into the net repository, but this
> submission is designed to be applied to the current net-next
> that does not yet include them. As a result there will be some
> merge conflicts that I would be happy to help resolve if desired.
> 
> Specifically, conflicts should occur with these patches from the
> minor bug fixes set:
> [PATCH v2 net 3/8] net: bcmgenet: reserved phy revisions must be checked first
> [PATCH v2 net 5/8] net: bcmgenet: synchronize irq0 status between the isr and 
> task
> [PATCH v2 net 8/8] net: bcmgenet: decouple flow control from 
> bcmgenet_tx_reclaim

Series applied, thanks Doug.

Re: [4.10+] sctp lockdep trace

2017-03-13 Thread Xin Long

On Tue, Mar 14, 2017 at 4:11 AM, Marcelo Ricardo Leitner
 wrote:
> On Mon, Mar 13, 2017 at 05:10:45PM -0300, Marcelo Ricardo Leitner wrote:
>> On Fri, Feb 24, 2017 at 05:21:10PM -0500, Dave Jones wrote:
>> > [  244.251557] ===
>> > [  244.263321] [ ERR: suspicious RCU usage.  ]
>> > [  244.274982] 4.10.0-think+ #7 Not tainted
>> > [  244.286511] ---
>> > [  244.298008] ./include/linux/rhashtable.h:602 suspicious 
>> > rcu_dereference_check() usage!
>> > [  244.309665]
>> >other info that might help us debug this:
>> >
>> > [  244.344629]
>> >rcu_scheduler_active = 2, debug_locks = 1
>> > [  244.367839] 1 lock held by trinity-c30/1781:
>> > [  244.379481]  #0:
>> > [  244.390848]  (
>> > [  244.402372] sk_lock-AF_INET
>> > [  244.413825] ){+.+.+.}
>> > [  244.425231] , at: [] sctp_sendmsg+0x330/0xfe0 [sctp]
>> > [  244.436774]
>> >stack backtrace:
>> > [  244.459620] CPU: 3 PID: 1781 Comm: trinity-c30 Not tainted 
>> > 4.10.0-think+ #7
>> > [  244.482790] Call Trace:
>> > [  244.494201]  dump_stack+0x68/0x93
>> > [  244.505598]  lockdep_rcu_suspicious+0xce/0xf0
>> > [  244.516924]  sctp_hash_transport+0x406/0x7e0 [sctp]
>> > [  244.528137]  ? sctp_endpoint_bh_rcv+0x171/0x290 [sctp]
>> > [  244.539243]  sctp_assoc_add_peer+0x290/0x3c0 [sctp]
>> > [  244.550291]  sctp_sendmsg+0x8f7/0xfe0 [sctp]
>> > [  244.561258]  ? rw_copy_check_uvector+0x8e/0x190
>> > [  244.572308]  ? import_iovec+0x3a/0xe0
>> > [  244.583232]  inet_sendmsg+0x49/0x1e0
>> > [  244.594150]  ___sys_sendmsg+0x2d4/0x300
>> > [  244.605002]  ? debug_smp_processor_id+0x17/0x20
>> > [  244.615844]  ? debug_smp_processor_id+0x17/0x20
>> > [  244.626533]  ? get_lock_stats+0x19/0x50
>> > [  244.637141]  __sys_sendmsg+0x54/0x90
>> > [  244.647817]  SyS_sendmsg+0x12/0x20
>> > [  244.658400]  do_syscall_64+0x66/0x1d0
>> > [  244.668990]  entry_SYSCALL64_slow_path+0x25/0x25
>> > [  244.679582] RIP: 0033:0x7fe095fcb0f9
>> > [  244.690079] RSP: 002b:7ffc5601b1d8 EFLAGS: 0246
>> > [  244.700704]  ORIG_RAX: 002e
>> > [  244.711248] RAX: ffda RBX: 002e RCX: 
>> > 7fe095fcb0f9
>> > [  244.721818] RDX: 0080 RSI: 5592de12ddc0 RDI: 
>> > 012d
>> > [  244.732282] RBP: 7fe0965c8000 R08: c000 R09: 
>> > 00dc
>> > [  244.742576] R10: 000302120088 R11: 0246 R12: 
>> > 0002
>> > [  244.752804] R13: 7fe0965c8048 R14: 7fe0966a1ad8 R15: 
>> > 7fe0965c8000
>> >
>> > [  244.775549] ===
>> > [  244.785875] [ ERR: suspicious RCU usage.  ]
>> > [  244.796951] 4.10.0-think+ #7 Not tainted
>> > [  244.807185] ---
>> > [  244.819213] ./include/linux/rhashtable.h:605 suspicious 
>> > rcu_dereference_check() usage!
>> > [  244.829420]
>> >other info that might help us debug this:
>> >
>> > [  244.859963]
>> >rcu_scheduler_active = 2, debug_locks = 1
>> > [  244.879766] 1 lock held by trinity-c30/1781:
>> > [  244.889953]  #0:
>> > [  244.90]  (
>> > [  244.909854] sk_lock-AF_INET
>> > [  244.919645] ){+.+.+.}
>> > [  244.929238] , at: [] sctp_sendmsg+0x330/0xfe0 [sctp]
>> > [  244.939167]
>> >stack backtrace:
>> > [  244.958506] CPU: 3 PID: 1781 Comm: trinity-c30 Not tainted 
>> > 4.10.0-think+ #7
>> > [  244.978102] Call Trace:
>> > [  244.987735]  dump_stack+0x68/0x93
>> > [  244.997112]  lockdep_rcu_suspicious+0xce/0xf0
>> > [  245.006588]  sctp_hash_transport+0x4ca/0x7e0 [sctp]
>> > [  245.016264]  ? sctp_endpoint_bh_rcv+0x171/0x290 [sctp]
>> > [  245.025797]  sctp_assoc_add_peer+0x290/0x3c0 [sctp]
>> > [  245.035380]  sctp_sendmsg+0x8f7/0xfe0 [sctp]
>> > [  245.044883]  ? rw_copy_check_uvector+0x8e/0x190
>> > [  245.054464]  ? import_iovec+0x3a/0xe0
>> > [  245.064016]  inet_sendmsg+0x49/0x1e0
>> > [  245.073516]  ___sys_sendmsg+0x2d4/0x300
>> > [  245.082967]  ? debug_smp_processor_id+0x17/0x20
>> > [  245.092448]  ? debug_smp_processor_id+0x17/0x20
>> > [  245.101850]  ? get_lock_stats+0x19/0x50
>> > [  245.70]  __sys_sendmsg+0x54/0x90
>> > [  245.120451]  SyS_sendmsg+0x12/0x20
>> > [  245.129649]  do_syscall_64+0x66/0x1d0
>> > [  245.138783]  entry_SYSCALL64_slow_path+0x25/0x25
>> > [  245.147678] RIP: 0033:0x7fe095fcb0f9
>> > [  245.156588] RSP: 002b:7ffc5601b1d8 EFLAGS: 0246
>> > [  245.165503]  ORIG_RAX: 002e
>> > [  245.174601] RAX: ffda RBX: 002e RCX: 
>> > 7fe095fcb0f9
>> > [  245.183861] RDX: 0080 RSI: 5592de12ddc0 RDI: 
>> > 012d
>> > [  245.193038] RBP: 7fe0965c8000 R08: c000 R09: 
>> > 00dc
>> > [  245.202214] R10: 000302120088 R11: 0246 R12: 
>> > 0002
>> > [  245.211261] R13: 7fe0965c8048 R14: 7fe0966a1ad8 R15: 
>> > 7fe0965c8000
>> >
>> >

Re: bond procfs hw addr prints

2017-03-13 Thread Jarod Wilson


On 2017-03-13 10:06 PM, Jarod Wilson wrote:

On 2017-03-13 8:28 PM, Jay Vosburgh wrote:

Jarod Wilson  wrote:


I've got a bug report for someone using a Intel OPA devices in a
bond, and
it appears these devices have a hardware address length of 20,
opposed to
the typical 6 on ethernet. When they dump /proc/net/bonding/bondX, it
only
prints the first 6 of the address, per %pM and mac_address_string(),
while
sysfs for the interface does print the right thing, since it uses
sysfs_print_mac(), which takes a length argument.


This (20 octet MAC length) is true for any Infiniband device.


So the question is... What's the best route to take here? Expand %pM to
support variable length hardware addresses? Use sysfs_* in procfs?
Reinvent the wheel? Nothing I've tinkered with just yet feels very
clean,
on top of not actually working yet. :)


sysfs_format_mac (not _print_mac) uses "%*phC", len, addr in its
format string.  Perhaps that format would be a better choice than %pM
for this case?


Ah, I'd failed to fully grasp how %phC worked, had actually tried it w/o
the * in there, and only the first char of the addr was printing.
Working on an updated version that uses %*phC properly, which does look
like the way to go here. (Didn't help that I was also looking at an
older codebase that didn't have the sysfs_format_mac de-duplication).
I'll try to have a tested patch in flight tomorrow.


Hm... One problem I'm seeing: perm_hwaddr[ETH_ALEN], 
partner_system[ETH_ALEN], mac_addr_value[ETH_ALEN]. Looks like just 
about all places where storage for only ETH_ALEN is available needs to 
be adjusted to maybe MAX_ADDR_LEN?


So I have something tested that uses %*phC, but only on ethernet 
hardware so far, and I forsee bad juju for infiniband, because of that 
ETH_ALEN issue...


--
Jarod Wilson
ja...@redhat.com

openvswitch conntrack and nat problem in first packet reply with RST

2017-03-13 Thread wenxu

Hi all,

There is a simple test for conntrack and nat in openvswitch.  I want to do 
stateful
firewall with conntrack then do nat

netns1 port1 with ip 10.0.0.7
netns2 port2 with ip 1.1.1.7

netns1 10.0.0.7 src -nat to 2.2.1.7 access netns2 1.1.1.7

1. # ovs-ofctl add-flow br0  'ip,in_port=1 actions=ct(table=1,zone=1)'
2. # ovs-ofctl add-flow br0  'ip,in_port=2 actions=ct(table=1,zone=1)'
3. # ovs-ofctl add-flow br0  'table=1, 
ct_state=+new+trk,tcp,in_port=1,tp_dst=123 
actions=ct(commit,zone=1,nat(src=2.2.1.7)),output:2'
4. # ovs-ofctl add-flow br0  'table=1, ct_state=+est+trk,ip,in_port=2 
actions=ct(commit,zone=1,nat(dst=10.0.0.7)),output:1'
5. # ovs-ofctl add-flow br0  'table=1, ct_state=+est+trk,ip,in_port=1  
actions=ct(commit,zone=1,nat(src=2.2.1.7)),output:2'


I  found that  netns1 can access 1.1.1.7:123  when there is 123-port listen on 
1.1.1.7  in netns2

But if there is no listen 123 port, The first RST packet reply by 1.1.1.7
(no datapath kernel rule) can't do dst-nat back to 10.0.0.7.  The second RST 
packet is ok (there is datapath kernel rule which comes from first RST packet)

# tcpdump -i eth0 -nnn
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes
14:44:13.575200 IP 10.0.0.7.39891 > 1.1.1.7.123: Flags [S], seq 93585, win 
29200, options [mss 1460,sackOK,TS val 584707316 ecr 0,nop,wscale 7], length 0
14:44:13.576036 IP 1.1.1.7.123 > 2.2.1.7.39891: Flags [R.], seq 0, ack 
93586, win 0, length 0

But the datapath flow is correct
# ovs-dpctl dump-flows
recirc_id(0),in_port(7),eth_type(0x0800),ipv4(frag=no), packets:0, bytes:0, 
used:never, actions:ct(zone=1),recirc(0x5a)
recirc_id(0x5a),in_port(7),ct_state(+new+trk),eth_type(0x0800),ipv4(proto=6,frag=no),tcp(dst=123),
 packets:0, bytes:0, used:never,
actions:ct(commit,zone=1,nat(src=2.2.1.7)),8
recirc_id(0),in_port(8),eth_type(0x0800),ipv4(frag=no), packets:0, bytes:0, 
used:never, actions:ct(zone=1),recirc(0x5b)
recirc_id(0x5b),in_port(8),ct_state(-new+est+trk),eth_type(0x0800),ipv4(frag=no),
 packets:0, bytes:0, used:never,
actions:ct(commit,zone=1,nat(dst=10.0.0.7)),7


I think It's a matter with the PACKET-OUT and RST packet

There are two packet-out for rule2 and rul4. Rule2 go through connect track and 
find it is an RST packet then delete the conntrack . It leads the second 
packet(come from rule4) can't find the conntack to do dst-nat.

In "netfilter/nf_conntrack_proto_tcp.c file
 if (!test_bit(IPS_SEEN_REPLY_BIT, >status)) {
/* If only reply is a RST, we can consider ourselves not to
   have an established connection: this is a fairly common
   problem case, so we can delete the conntrack
   immediately.  --RR */
if (th->rst ) {
nf_ct_kill_acct(ct, ctinfo, skb);
return NF_ACCEPT;
}
}


It should add a switch to avoid this conntrack  be deleted.

if (!test_bit(IPS_SEEN_REPLY_BIT, >status)) {
/* If only reply is a RST, we can consider ourselves not to
   have an established connection: this is a fairly common
   problem case, so we can delete the conntrack
   immediately.  --RR */
-if (th->rst ) {
+if (th->rst && !nf_ct_tcp_rst_no_kill) {
nf_ct_kill_acct(ct, ctinfo, skb);
return NF_ACCEPT;
}


BR
wenxu

Re: [PATCH net-next 02/12] net: phy: bcm7xxx: add support for 28nm EPHY

2017-03-13 Thread Andrew Lunn

On Mon, Mar 13, 2017 at 07:06:25PM -0700, Doug Berger wrote:
> On 03/13/2017 06:06 PM, Andrew Lunn wrote:
> > On Mon, Mar 13, 2017 at 05:41:32PM -0700, Doug Berger wrote:
> >> +static int bcm7xxx_28nm_ephy_01_afe_config_init(struct phy_device *phydev)
> >> +{
> >> +  int ret;
> >> +
> >> +  /* set shadow mode 2 */
> >> +  ret = phy_set_clr_bits(phydev, MII_BCM7XXX_TEST,
> >> + MII_BCM7XXX_SHD_MODE_2, 0);
> >> +  if (ret < 0)
> >> +  return ret;
> >> +
> >> +  /* Set current trim values INT_trim = -1, Ext_trim =0 */
> >> +  ret = phy_write(phydev, MII_BCM7XXX_SHD_2_BIAS_TRIM, 0x3BE0);
> >> +  if (ret < 0)
> >> +  goto reset_shadow_mode;
> >> +
> >> +  /* Cal reset */
> >> +  ret = phy_write(phydev, MII_BCM7XXX_SHD_2_ADDR_CTRL,
> >> +  MII_BCM7XXX_SHD_3_TL4);
> >> +  if (ret < 0)
> >> +  goto reset_shadow_mode;
> > 
> > Hi Doug
> > 
> > It would be nice to have a few blank lines here and there...
> > 
> Thanks for taking the time to review this.
> 
> In general I try to keep lines of related functionality together and use
> the blank lines to help identify boundaries.  In this particular case, I
> believe it is clearer to keep the code that may return an error code
> together with the code that tests for the error.

Hi Doug

I agree with that. Which is why i placed the comment between the goto
and the next block of code. This is where i think there should be a
blank line, to separate it from setting the trim values.

> > return phy_set_clr_bits(phydev, MII_BCM7XXX_TEST, 0,
> > MII_BCM7XXX_SHD_MODE_2);
> > 
> The trouble here is that currently the phy_set_clr_bits() function
> returns the value written or a negative error and the function
> bcm7xxx_28nm_ephy_01_afe_config_init() is supposed to return 0 on
> success and non-zero on failure so this would not have the same
> functionality.

Ah, O.K. No problem.

> >> +  /* Advertise supported modes */
> >> +  ret = phy_write(phydev, MII_BCM7XXX_SHD_2_ADDR_CTRL,
> >> +  MII_BCM7XXX_SHD_3_AN_EEE_ADV);
> >> +  if (ret < 0)
> >> +  goto reset_shadow_mode;
> > 
> > blank...
> > 
> >> +  ret = phy_write(phydev, MII_BCM7XXX_SHD_2_CTRL_STAT,
> >> +  MDIO_EEE_100TX);
> >> +  if (ret < 0)
> >> +  goto reset_shadow_mode;
> Here the two phy_write() calls are required to "/* Advertise supported
> modes */" (one sets an address and the other specifies the data to write
> to that address) so I kept them together to imply an association with
> the preceding comment.

O.K, i probably would if written a little helper function. And you
seem to have this repeated a few times, so the helper would be used a
few times.

> >> +
> >> +  /* Restore Defaults */
> >> +  ret = phy_write(phydev, MII_BCM7XXX_SHD_2_ADDR_CTRL,
> >> +  MII_BCM7XXX_SHD_3_PCS_CTRL_2);
> >> +  if (ret < 0)
> >> +  goto reset_shadow_mode;
> >> +  ret = phy_write(phydev, MII_BCM7XXX_SHD_2_CTRL_STAT,
> >> +  MII_BCM7XXX_PCS_CTRL_2_DEF);
> >> +  if (ret < 0)
> >> +  goto reset_shadow_mode;
> Same here.
> 
> >> +
> >> +  ret = phy_write(phydev, MII_BCM7XXX_SHD_2_ADDR_CTRL,
> >> +  MII_BCM7XXX_SHD_3_EEE_THRESH);
> >> +  if (ret < 0)
> >> +  goto reset_shadow_mode;
> >> +  ret = phy_write(phydev, MII_BCM7XXX_SHD_2_CTRL_STAT,
> >> +  MII_BCM7XXX_EEE_THRESH_DEF);
> >> +  if (ret < 0)
> >> +  goto reset_shadow_mode;
> Here...
> 
> >> +
> >> +  /* Enable EEE autonegotiation */
> >> +  ret = phy_write(phydev, MII_BCM7XXX_SHD_2_ADDR_CTRL,
> >> +  MII_BCM7XXX_SHD_3_AN_STAT);
> >> +  if (ret < 0)
> >> +  goto reset_shadow_mode;
> >> +  ret = phy_write(phydev, MII_BCM7XXX_SHD_2_CTRL_STAT,
> >> +  (MII_BCM7XXX_AN_NULL_MSG_EN | MII_BCM7XXX_AN_EEE_EN));
> >> +  if (ret < 0)
> >> +  goto reset_shadow_mode;
> and here.
> 
> >> +
> >> +reset_shadow_mode:
> >> +  /* reset shadow mode 2 */
> >> +  ret = phy_set_clr_bits(phydev, MII_BCM7XXX_TEST, 0,
> >> + MII_BCM7XXX_SHD_MODE_2);
> >> +  if (ret < 0)
> >> +  return ret;
> >> +
> >> +  /* Restart autoneg */
> >> +  phy_write(phydev, MII_BMCR,
> >> +(BMCR_SPEED100 | BMCR_ANENABLE | BMCR_ANRESTART));
> >> +
> >> +  return 0;
> > 
> >   return phy_write(.); ?
> > 
> I would feel more comfortable with this if the return value of the
> struct mii_bus write member function was more clearly defined.  In our
> case, we return 0 on success so I would consider this change, but I
> would prefer a consensus that all mii_bus write functions return 0 on
> success before doing so.

You are right in that this is not clearly defined. But i just looked
through all the mdio drivers in drivers/net/phy and they all do return
0 for their write operation.

  Andrew

Re: [PATCH] usbnet: smsc95xx: Reduce logging noise

2017-03-13 Thread Guenter Roeck

On 03/13/2017 03:32 PM, David Miller wrote:

From: Guenter Roeck 
Date: Fri, 10 Mar 2017 17:45:21 -0800

An insert/remove stress test generated the following log message sequence.

...

Use netdev_dbg() instead of netdev_warn() for the repeating messages
to reduce logging noise.

Signed-off-by: Guenter Roeck 

The problem I have with changes like this is that outside of your
stress test situation these messages are extremely useful but will
now be hidden making diagnosis of problems more difficult.

In general I agree, but in this case the calling code also generates
another set of error messages, which I considered a bit excessive.

No problem, though, we can live with the noise.

Guenter

RE: [Intel-wired-lan] [PATCH] net: intel: ixgb: use new api ethtool_{get|set}_link_ksettings

2017-03-13 Thread Brown, Aaron F

> From: Intel-wired-lan [mailto:intel-wired-lan-boun...@lists.osuosl.org] On
> Behalf Of Philippe Reynes
> Sent: Sunday, February 5, 2017 3:11 PM
> To: Kirsher, Jeffrey T ; da...@davemloft.net
> Cc: netdev@vger.kernel.org; intel-wired-...@lists.osuosl.org; linux-
> ker...@vger.kernel.org; Philippe Reynes 
> Subject: [Intel-wired-lan] [PATCH] net: intel: ixgb: use new api
> ethtool_{get|set}_link_ksettings
> 
> The ethtool api {get|set}_settings is deprecated.
> We move this driver to new api {get|set}_link_ksettings.
> 
> As I don't have the hardware, I'd be very pleased if
> someone may test this patch.
> 
> Signed-off-by: Philippe Reynes 
> ---
>  drivers/net/ethernet/intel/ixgb/ixgb_ethtool.c |   39 ++-
> -
>  1 files changed, 23 insertions(+), 16 deletions(-)
> 

Tested-by: Aaron Brown 
Well, compile / static tested by:  I probably have an ixgb adapter somewhere as 
well as a PCI-X slot that might work for it, but am not sure I can find them or 
that they still function.  If I stumble upon one I'll fire it up and run it 
through some paces, but the variant of this patch for other drivers seem good 
and I don't want to hold this patch up any longer.

Re: [PATCH net-next 02/12] net: phy: bcm7xxx: add support for 28nm EPHY

2017-03-13 Thread Doug Berger

On 03/13/2017 06:06 PM, Andrew Lunn wrote:
> On Mon, Mar 13, 2017 at 05:41:32PM -0700, Doug Berger wrote:
>> +static int bcm7xxx_28nm_ephy_01_afe_config_init(struct phy_device *phydev)
>> +{
>> +int ret;
>> +
>> +/* set shadow mode 2 */
>> +ret = phy_set_clr_bits(phydev, MII_BCM7XXX_TEST,
>> +   MII_BCM7XXX_SHD_MODE_2, 0);
>> +if (ret < 0)
>> +return ret;
>> +
>> +/* Set current trim values INT_trim = -1, Ext_trim =0 */
>> +ret = phy_write(phydev, MII_BCM7XXX_SHD_2_BIAS_TRIM, 0x3BE0);
>> +if (ret < 0)
>> +goto reset_shadow_mode;
>> +
>> +/* Cal reset */
>> +ret = phy_write(phydev, MII_BCM7XXX_SHD_2_ADDR_CTRL,
>> +MII_BCM7XXX_SHD_3_TL4);
>> +if (ret < 0)
>> +goto reset_shadow_mode;
> 
> Hi Doug
> 
> It would be nice to have a few blank lines here and there...
> 
Thanks for taking the time to review this.

In general I try to keep lines of related functionality together and use
the blank lines to help identify boundaries.  In this particular case, I
believe it is clearer to keep the code that may return an error code
together with the code that tests for the error.

Perhaps this would be a better alternative:
+   /* Set current trim values INT_trim = -1, Ext_trim =0 */
+   if (phy_write(phydev, MII_BCM7XXX_SHD_2_BIAS_TRIM, 0x3BE0) < 0)
+   goto reset_shadow_mode;
+
+   /* Cal reset */
+   if (phy_write(phydev, MII_BCM7XXX_SHD_2_ADDR_CTRL,
+ MII_BCM7XXX_SHD_3_TL4) < 0)
+   goto reset_shadow_mode;

>> +ret = phy_set_clr_bits(phydev, MII_BCM7XXX_SHD_2_CTRL_STAT,
>> +   MII_BCM7XXX_TL4_RST_MSK, 0);
>> +if (ret < 0)
>> +goto reset_shadow_mode;
>> +
>> +/* Cal reset disable */
>> +ret = phy_write(phydev, MII_BCM7XXX_SHD_2_ADDR_CTRL,
>> +MII_BCM7XXX_SHD_3_TL4);
>> +if (ret < 0)
>> +goto reset_shadow_mode;
> 
> ... just to break things up into readable chunks.
> 
Maybe:
+   if (phy_set_clr_bits(phydev, MII_BCM7XXX_SHD_2_CTRL_STAT,
+MII_BCM7XXX_TL4_RST_MSK, 0) < 0)
+   goto reset_shadow_mode;
+
+   /* Cal reset disable */
+   if (phy_write(phydev, MII_BCM7XXX_SHD_2_ADDR_CTRL,
+ MII_BCM7XXX_SHD_3_TL4) < 0)
+   goto reset_shadow_mode;

>> +ret = phy_set_clr_bits(phydev, MII_BCM7XXX_SHD_2_CTRL_STAT,
>> +   0, MII_BCM7XXX_TL4_RST_MSK);
>> +if (ret < 0)
>> +goto reset_shadow_mode;
>> +
>> +reset_shadow_mode:
>> +/* reset shadow mode 2 */
>> +ret = phy_set_clr_bits(phydev, MII_BCM7XXX_TEST, 0,
>> +   MII_BCM7XXX_SHD_MODE_2);
>> +if (ret < 0)
>> +return ret;
>> +
>> +return 0;
> 
> How about:
> 
>   return phy_set_clr_bits(phydev, MII_BCM7XXX_TEST, 0,
>   MII_BCM7XXX_SHD_MODE_2);
> 
The trouble here is that currently the phy_set_clr_bits() function
returns the value written or a negative error and the function
bcm7xxx_28nm_ephy_01_afe_config_init() is supposed to return 0 on
success and non-zero on failure so this would not have the same
functionality.  I suppose we could change the phy_set_clr_bits() API to
return 0 on success to make this work, but I wasn't trying to change
preexisting functionality in this file.

>> +/* The 28nm EPHY does not support Clause 45 (MMD) used by bcm-phy-lib */
>> +static int bcm7xxx_28nm_ephy_apd_enable(struct phy_device *phydev)
>> +{
>> +int ret;
>> +
>> +/* set shadow mode 1 */
>> +ret = phy_set_clr_bits(phydev, MII_BRCM_FET_BRCMTEST,
>> +   MII_BRCM_FET_BT_SRE, 0);
>> +if (ret < 0)
>> +return ret;
>> +
>> +/* Enable auto-power down */
>> +ret = phy_set_clr_bits(phydev, MII_BRCM_FET_SHDW_AUXSTAT2,
>> +   MII_BRCM_FET_SHDW_AS2_APDE, 0);
>> +if (ret < 0)
>> +return ret;
>> +
>> +/* reset shadow mode 1 */
>> +ret = phy_set_clr_bits(phydev, MII_BRCM_FET_BRCMTEST, 0,
>> +   MII_BRCM_FET_BT_SRE);
>> +if (ret < 0)
>> +return ret;
>> +
>> +return 0;
> 
> How about just
> 
> return phy_set_clr_bits(phydev, MII_BRCM_FET_BRCMTEST, 0,
>  MII_BRCM_FET_BT_SRE);
> 
Same remark as above.

>> +}
>> +
>> +static int bcm7xxx_28nm_ephy_eee_enable(struct phy_device *phydev)
>> +{
>> +int ret;
>> +
>> +/* set shadow mode 2 */
>> +ret = phy_set_clr_bits(phydev, MII_BCM7XXX_TEST,
>> +   MII_BCM7XXX_SHD_MODE_2, 0);
>> +if (ret < 0)
>> +return ret;
>> +
>> +/* Advertise supported modes */
>> +ret = phy_write(phydev, MII_BCM7XXX_SHD_2_ADDR_CTRL,
>> +MII_BCM7XXX_SHD_3_AN_EEE_ADV);
>> +if (ret < 0)
>> +goto reset_shadow_mode;
> 
> blank...
> 
>> +ret =

Re: bond procfs hw addr prints

2017-03-13 Thread Jarod Wilson


On 2017-03-13 8:28 PM, Jay Vosburgh wrote:

Jarod Wilson  wrote:


I've got a bug report for someone using a Intel OPA devices in a bond, and
it appears these devices have a hardware address length of 20, opposed to
the typical 6 on ethernet. When they dump /proc/net/bonding/bondX, it only
prints the first 6 of the address, per %pM and mac_address_string(), while
sysfs for the interface does print the right thing, since it uses
sysfs_print_mac(), which takes a length argument.


This (20 octet MAC length) is true for any Infiniband device.


So the question is... What's the best route to take here? Expand %pM to
support variable length hardware addresses? Use sysfs_* in procfs?
Reinvent the wheel? Nothing I've tinkered with just yet feels very clean,
on top of not actually working yet. :)


sysfs_format_mac (not _print_mac) uses "%*phC", len, addr in its
format string.  Perhaps that format would be a better choice than %pM
for this case?


Ah, I'd failed to fully grasp how %phC worked, had actually tried it w/o 
the * in there, and only the first char of the addr was printing. 
Working on an updated version that uses %*phC properly, which does look 
like the way to go here. (Didn't help that I was also looking at an 
older codebase that didn't have the sysfs_format_mac de-duplication). 
I'll try to have a tested patch in flight tomorrow.


--
Jarod Wilson
ja...@redhat.com

[PATCH] ucc/hdlc: fix two little issue

2017-03-13 Thread Zhao Qiang

1. modify bd_status from u32 to u16 in function hdlc_rx_done,
because bd_status register is 16bits
2. write bd_length register before writing bd_status register

Signed-off-by: Zhao Qiang 
---
 drivers/net/wan/fsl_ucc_hdlc.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/wan/fsl_ucc_hdlc.c b/drivers/net/wan/fsl_ucc_hdlc.c
index a5045b5..6742ae6 100644
--- a/drivers/net/wan/fsl_ucc_hdlc.c
+++ b/drivers/net/wan/fsl_ucc_hdlc.c
@@ -381,8 +381,8 @@ static netdev_tx_t ucc_hdlc_tx(struct sk_buff *skb, struct 
net_device *dev)
/* set bd status and length */
bd_status = (bd_status & T_W_S) | T_R_S | T_I_S | T_L_S | T_TC_S;
 
-   iowrite16be(bd_status, >status);
iowrite16be(skb->len, >length);
+   iowrite16be(bd_status, >status);
 
/* Move to next BD in the ring */
if (!(bd_status & T_W_S))
@@ -457,7 +457,7 @@ static int hdlc_rx_done(struct ucc_hdlc_private *priv, int 
rx_work_limit)
struct sk_buff *skb;
hdlc_device *hdlc = dev_to_hdlc(dev);
struct qe_bd *bd;
-   u32 bd_status;
+   u16 bd_status;
u16 length, howmany = 0;
u8 *bdbuffer;
int i;
-- 
2.1.0.27.g96db324

RE: [Intel-wired-lan] [PATCH] net: intel: igbvf: use new api ethtool_{get|set}_link_ksettings

2017-03-13 Thread Brown, Aaron F

> From: Intel-wired-lan [mailto:intel-wired-lan-boun...@lists.osuosl.org] On
> Behalf Of Philippe Reynes
> Sent: Sunday, February 5, 2017 2:55 PM
> To: Kirsher, Jeffrey T ; da...@davemloft.net
> Cc: netdev@vger.kernel.org; intel-wired-...@lists.osuosl.org; linux-
> ker...@vger.kernel.org; Philippe Reynes 
> Subject: [Intel-wired-lan] [PATCH] net: intel: igbvf: use new api
> ethtool_{get|set}_link_ksettings
> 
> The ethtool api {get|set}_settings is deprecated.
> We move this driver to new api {get|set}_link_ksettings.
> 
> As I don't have the hardware, I'd be very pleased if
> someone may test this patch.
> 
> Signed-off-by: Philippe Reynes 
> ---
>  drivers/net/ethernet/intel/igbvf/ethtool.c |   38 
> ++--
>  1 files changed, 19 insertions(+), 19 deletions(-)

Tested-by: Aaron Brown

RE: [Intel-wired-lan] [PATCH] net: intel: igb: use new api ethtool_{get|set}_link_ksettings

2017-03-13 Thread Brown, Aaron F

> From: Intel-wired-lan [mailto:intel-wired-lan-boun...@lists.osuosl.org] On
> Behalf Of Philippe Reynes
> Sent: Sunday, February 5, 2017 9:56 AM
> To: Kirsher, Jeffrey T ; da...@davemloft.net
> Cc: netdev@vger.kernel.org; intel-wired-...@lists.osuosl.org; linux-
> ker...@vger.kernel.org; Philippe Reynes 
> Subject: [Intel-wired-lan] [PATCH] net: intel: igb: use new api
> ethtool_{get|set}_link_ksettings
> 
> The ethtool api {get|set}_settings is deprecated.
> We move this driver to new api {get|set}_link_ksettings.
> 
> As I don't have the hardware, I'd be very pleased if
> someone may test this patch.
> 
> Signed-off-by: Philippe Reynes 
> ---
>  drivers/net/ethernet/intel/igb/igb_ethtool.c |  108 ++---
> -
>  1 files changed, 59 insertions(+), 49 deletions(-)
> 

Tested-by: Aaron Brown

Re: [PATCH] net: ethernet: aquantia: set net_device mtu when mtu is changed

2017-03-13 Thread Pavel Belous



On 03/14/2017 02:07 AM, David Arcari wrote:

When the aquantia device mtu is changed the net_device structure is not
updated.  As a result the ip command does not properly reflect the mtu change.

Commit 5513e16421cb incorrectly assumed that __dev_set_mtu() was making the
assignment ndev->mtu = new_mtu;  This is not true in the case where the driver
has a ndo_change_mtu routine.

Fixes: 5513e16421cb ("net: ethernet: aquantia: Fixes for aq_ndev_change_mtu")

Cc: Pavel Belous 
Signed-off-by: David Arcari 
---
 drivers/net/ethernet/aquantia/atlantic/aq_main.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/aquantia/atlantic/aq_main.c 
b/drivers/net/ethernet/aquantia/atlantic/aq_main.c
index dad6362..d05fbfd 100644
--- a/drivers/net/ethernet/aquantia/atlantic/aq_main.c
+++ b/drivers/net/ethernet/aquantia/atlantic/aq_main.c
@@ -98,6 +98,7 @@ static int aq_ndev_change_mtu(struct net_device *ndev, int 
new_mtu)

if (err < 0)
goto err_exit;
+   ndev->mtu = new_mtu;

if (netif_running(ndev)) {
aq_ndev_close(ndev);



Tested-by: Pavel Belous

Re: [RFC PATCH] sock: add SO_RCVQUEUE_SIZE getsockopt

2017-03-13 Thread Eric Dumazet

On Mon, 2017-03-13 at 18:34 -0500, Josh Hunt wrote:

> In this particular case they really do want to know total # of bytes in 
> the receive queue, not the data bytes they can consume from an 
> application pov. The kernel currently only exposes this value through 
> netlink or /proc/net/udp from what I saw.
> 
> I believe Eric's suggestion in his previous mail was to export all of 
> these meminfo metrics via a single socket option call similar to how its 
> done in netlink. We could then use that for both call sites.
> 
> I agree that it would be useful to also have the data you and Eric are 
> suggesting exposed somewhere, the total # of skb->len bytes sitting in 
> the receive queue. I could add that as a second socket option.

Please note that UDP stack does not maintain a per socket sum(skb->len)

Implementing this in a system call would require to lock the receive
queue (blocking BH) and iterating over a potential huge skb list.

Or add a new socket field and add/sub every skb->len of packets
added/removed to/from receive queue.

So I would prefer to not provide this information, this looks quite a
bloat.

Re: [PATCH net-next 02/12] net: phy: bcm7xxx: add support for 28nm EPHY

2017-03-13 Thread Andrew Lunn

On Mon, Mar 13, 2017 at 05:41:32PM -0700, Doug Berger wrote:
> +static int bcm7xxx_28nm_ephy_01_afe_config_init(struct phy_device *phydev)
> +{
> + int ret;
> +
> + /* set shadow mode 2 */
> + ret = phy_set_clr_bits(phydev, MII_BCM7XXX_TEST,
> +MII_BCM7XXX_SHD_MODE_2, 0);
> + if (ret < 0)
> + return ret;
> +
> + /* Set current trim values INT_trim = -1, Ext_trim =0 */
> + ret = phy_write(phydev, MII_BCM7XXX_SHD_2_BIAS_TRIM, 0x3BE0);
> + if (ret < 0)
> + goto reset_shadow_mode;
> +
> + /* Cal reset */
> + ret = phy_write(phydev, MII_BCM7XXX_SHD_2_ADDR_CTRL,
> + MII_BCM7XXX_SHD_3_TL4);
> + if (ret < 0)
> + goto reset_shadow_mode;

Hi Doug

It would be nice to have a few blank lines here and there...

> + ret = phy_set_clr_bits(phydev, MII_BCM7XXX_SHD_2_CTRL_STAT,
> +MII_BCM7XXX_TL4_RST_MSK, 0);
> + if (ret < 0)
> + goto reset_shadow_mode;
> +
> + /* Cal reset disable */
> + ret = phy_write(phydev, MII_BCM7XXX_SHD_2_ADDR_CTRL,
> + MII_BCM7XXX_SHD_3_TL4);
> + if (ret < 0)
> + goto reset_shadow_mode;

... just to break things up into readable chunks.

> + ret = phy_set_clr_bits(phydev, MII_BCM7XXX_SHD_2_CTRL_STAT,
> +0, MII_BCM7XXX_TL4_RST_MSK);
> + if (ret < 0)
> + goto reset_shadow_mode;
> +
> +reset_shadow_mode:
> + /* reset shadow mode 2 */
> + ret = phy_set_clr_bits(phydev, MII_BCM7XXX_TEST, 0,
> +MII_BCM7XXX_SHD_MODE_2);
> + if (ret < 0)
> + return ret;
> +
> + return 0;

How about:

return phy_set_clr_bits(phydev, MII_BCM7XXX_TEST, 0,
MII_BCM7XXX_SHD_MODE_2);

> +/* The 28nm EPHY does not support Clause 45 (MMD) used by bcm-phy-lib */
> +static int bcm7xxx_28nm_ephy_apd_enable(struct phy_device *phydev)
> +{
> + int ret;
> +
> + /* set shadow mode 1 */
> + ret = phy_set_clr_bits(phydev, MII_BRCM_FET_BRCMTEST,
> +MII_BRCM_FET_BT_SRE, 0);
> + if (ret < 0)
> + return ret;
> +
> + /* Enable auto-power down */
> + ret = phy_set_clr_bits(phydev, MII_BRCM_FET_SHDW_AUXSTAT2,
> +MII_BRCM_FET_SHDW_AS2_APDE, 0);
> + if (ret < 0)
> + return ret;
> +
> + /* reset shadow mode 1 */
> + ret = phy_set_clr_bits(phydev, MII_BRCM_FET_BRCMTEST, 0,
> +MII_BRCM_FET_BT_SRE);
> + if (ret < 0)
> + return ret;
> +
> + return 0;

How about just

return phy_set_clr_bits(phydev, MII_BRCM_FET_BRCMTEST, 0,
   MII_BRCM_FET_BT_SRE);


> +}
> +
> +static int bcm7xxx_28nm_ephy_eee_enable(struct phy_device *phydev)
> +{
> + int ret;
> +
> + /* set shadow mode 2 */
> + ret = phy_set_clr_bits(phydev, MII_BCM7XXX_TEST,
> +MII_BCM7XXX_SHD_MODE_2, 0);
> + if (ret < 0)
> + return ret;
> +
> + /* Advertise supported modes */
> + ret = phy_write(phydev, MII_BCM7XXX_SHD_2_ADDR_CTRL,
> + MII_BCM7XXX_SHD_3_AN_EEE_ADV);
> + if (ret < 0)
> + goto reset_shadow_mode;

blank...

> + ret = phy_write(phydev, MII_BCM7XXX_SHD_2_CTRL_STAT,
> + MDIO_EEE_100TX);
> + if (ret < 0)
> + goto reset_shadow_mode;
> +
> + /* Restore Defaults */
> + ret = phy_write(phydev, MII_BCM7XXX_SHD_2_ADDR_CTRL,
> + MII_BCM7XXX_SHD_3_PCS_CTRL_2);
> + if (ret < 0)
> + goto reset_shadow_mode;
> + ret = phy_write(phydev, MII_BCM7XXX_SHD_2_CTRL_STAT,
> + MII_BCM7XXX_PCS_CTRL_2_DEF);
> + if (ret < 0)
> + goto reset_shadow_mode;
> +
> + ret = phy_write(phydev, MII_BCM7XXX_SHD_2_ADDR_CTRL,
> + MII_BCM7XXX_SHD_3_EEE_THRESH);
> + if (ret < 0)
> + goto reset_shadow_mode;
> + ret = phy_write(phydev, MII_BCM7XXX_SHD_2_CTRL_STAT,
> + MII_BCM7XXX_EEE_THRESH_DEF);
> + if (ret < 0)
> + goto reset_shadow_mode;
> +
> + /* Enable EEE autonegotiation */
> + ret = phy_write(phydev, MII_BCM7XXX_SHD_2_ADDR_CTRL,
> + MII_BCM7XXX_SHD_3_AN_STAT);
> + if (ret < 0)
> + goto reset_shadow_mode;
> + ret = phy_write(phydev, MII_BCM7XXX_SHD_2_CTRL_STAT,
> + (MII_BCM7XXX_AN_NULL_MSG_EN | MII_BCM7XXX_AN_EEE_EN));
> + if (ret < 0)
> + goto reset_shadow_mode;
> +
> +reset_shadow_mode:
> + /* reset shadow mode 2 */
> + ret = phy_set_clr_bits(phydev, MII_BCM7XXX_TEST, 0,
> +MII_BCM7XXX_SHD_MODE_2);
> + if (ret < 0)
> + return ret;
> +
> + /* Restart autoneg */
> + phy_write(phydev, MII_BMCR,
> +   (BMCR_SPEED100 |

Re: [PATCH net-next] mlx4: Better use of order-0 pages in RX path

2017-03-13 Thread Eric Dumazet

On Mon, 2017-03-13 at 16:40 -0700, Alexei Starovoitov wrote:

> that's not how it works. It's a job of submitter to prove
> that additional code doesn't cause regressions especially
> when there are legitimate concerns.

This test was moved out of the mlx4_en_prepare_rx_desc() section into
the XDP_TX code path.

if (ring->page_cache.index > 0) {
/* XDP uses a single page per frame */
if (!frags->page) {
ring->page_cache.index--;
frags->page = 
ring->page_cache.buf[ring->page_cache.index].page;
frags->dma  = 
ring->page_cache.buf[ring->page_cache.index].dma;
}
frags->page_offset = XDP_PACKET_HEADROOM;
rx_desc->data[0].addr = cpu_to_be64(frags->dma +
XDP_PACKET_HEADROOM);
return 0;
}

Can you check again your claim, because I see no additional cost
for XDP_TX.

In fact I removed from the other paths (which are equally important I
believe) a test that had no use, so everybody should be happy.

Re: [PATCH net-next 12/12] net: bcmgenet: add support for the GENETv5 hardware

2017-03-13 Thread Florian Fainelli

On 03/13/2017 05:41 PM, Doug Berger wrote:
> This commit adds support for the GENETv5 implementation.
> 
> The GENETv5 reports a major version of 6 instead of 5 so compensate
> for this when verifying the configuration of the driver.  Also the
> EPHY revision is now contained in the MDIO registers of the PHY so
> the EPHY revision of 0 in GENET_VER_FMT is expected for GENETv5.
> 
> Signed-off-by: Doug Berger 

Reviewed-by: Florian Fainelli 
-- 
Florian

Re: [PATCH net-next 11/12] dt-bindings: net: update bcmgenet binding for GENETv5

2017-03-13 Thread Florian Fainelli

On 03/13/2017 05:41 PM, Doug Berger wrote:
> The device tree documentation must be updated to reflect the new compatible
> strings "brcm,genet-v5" and "brcm,genet-mdio-v5" used by the GENETv5 driver.
> 
> Signed-off-by: Doug Berger 

Reviewed-by: Florian Fainelli 
-- 
Florian

Re: [PATCH net-next 10/12] dt-bindings: net: document bcmgenet WoL interrupt

2017-03-13 Thread Florian Fainelli

On 03/13/2017 05:41 PM, Doug Berger wrote:
> A third interrupt cell can be provided to optionally specify
> the interrupt used for handling Wake on LAN events.
> 
> Typically the wake up handling uses a separate interrupt
> controller, so the interrupts-extended property is used to
> accommodate this.
> 
> Signed-off-by: Doug Berger 

Reviewed-by: Florian Fainelli 
-- 
Florian

Re: [PATCH net-next 09/12] net: bcmgenet: return EOPNOTSUPP for unknown ioctl commands

2017-03-13 Thread Florian Fainelli

On 03/13/2017 05:41 PM, Doug Berger wrote:
> This commit changes the ioctl handling behavior to return the
> EOPNOTSUPP error code instead of the EINVAL error code when an
> unknown ioctl command value is detected.
> 
> It also removes some redundant parsing of the ioctl command value
> and allows the SIOCSHWTSTAMP value to be handled.
> 
> Signed-off-by: Doug Berger 

Reviewed-by: Florian Fainelli 
-- 
Florian

Re: [PATCH net-next 08/12] net: bcmgenet: correct return value of __bcmgenet_tx_reclaim

2017-03-13 Thread Florian Fainelli

On 03/13/2017 05:41 PM, Doug Berger wrote:
> The reclaim function should return the number of buffer descriptors
> reclaimed, not just the number corresponding to skb packets.
> 
> Also, remove the unnecessary computation when updating the consumer
> index.
> 
> While this is not a functional problem it could degrade performance
> of napi in a fragmented transmit stream.
> 
> Signed-off-by: Doug Berger 

Reviewed-by: Florian Fainelli 
-- 
Florian

Re: [PATCH net-next 07/12] net: bcmgenet: clear status to reduce spurious interrupts

2017-03-13 Thread Florian Fainelli

On 03/13/2017 05:41 PM, Doug Berger wrote:
> Since the DMA interrupt status is latched and the DMA servicing can be
> polled, it is a good idea to clear the latched status of a DMA interrupt
> before performing the service that would be invoked by the interrupt.
> 
> This prevents old status from causing spurious interrupts when the
> interrupt is unmasked at a later time.
> 
> Signed-off-by: Doug Berger 

Reviewed-by: Florian Fainelli 
-- 
Florian

Re: [PATCH net-next 06/12] net: bcmgenet: remove handling of wol interrupts from isr0

2017-03-13 Thread Florian Fainelli

On 03/13/2017 05:41 PM, Doug Berger wrote:
> The bcmgenet_wol_isr() handler performs the necessary processing for
> waking from a GENET event.  There is no necessary functionality behind
> servicing the UMAC_IRQ_MPD_R event in the handling of isr0.  Therefore
> the code that unmasks and masks this interrupt and that gets invoked
> in response to it is removed by this commit.
> 
> Signed-off-by: Doug Berger 

Reviewed-by: Florian Fainelli 
-- 
Florian

Re: [PATCH net-next 05/12] net: bcmgenet: manage dma interrupts in napi code

2017-03-13 Thread Florian Fainelli

On 03/13/2017 05:41 PM, Doug Berger wrote:
> This commit moves DMA interrupt enabling out of init_umac() and adds
> the masking of these interrupts to the napi enable and disable code.
> 
> Signed-off-by: Doug Berger 

Reviewed-by: Florian Fainelli 
-- 
Florian

Re: [PATCH net-next 04/12] net: bcmgenet: remove meaningless lines

2017-03-13 Thread Florian Fainelli

On 03/13/2017 05:41 PM, Doug Berger wrote:
> An assortment of non-functional lines are removed to reduce confusion
> and some typos in comments are corrected.
> 
> Signed-off-by: Doug Berger 

Reviewed-by: Florian Fainelli 
-- 
Florian

Re: [PATCH net-next 03/12] net: bcmgenet: simplify circular pointer arithmetic

2017-03-13 Thread Florian Fainelli

On 03/13/2017 05:41 PM, Doug Berger wrote:
> A 2's complement subtraction will always do a borrow, so masking
> off the sign bits is the same as conditionally adding (mask+1).
> 
> Signed-off-by: Doug Berger 

Reviewed-by: Florian Fainelli 
-- 
Florian

Re: [PATCH net-next 02/12] net: phy: bcm7xxx: add support for 28nm EPHY

2017-03-13 Thread Florian Fainelli

On 03/13/2017 05:41 PM, Doug Berger wrote:
> This commit adds support for the internal fast ethernet 10/100 PHY
> found in the BCM7260, BCM7268, and BCM7271 devices.
> 
> Signed-off-by: Doug Berger 

Reviewed-by: Florian Fainelli 
-- 
Florian

Re: [PATCH net-next 01/12] net: phy: bcm-phylib: replace obsolete EEE macro references

2017-03-13 Thread Florian Fainelli

On 03/13/2017 05:41 PM, Doug Berger wrote:
> The macros MDIO_AN_EEE_ADV_100TX and MDIO_AN_EEE_ADV_1000T are now
> considered obsolete and are replaced in the kernel with the generic
> macros MDIO_EEE_100TX and MDIO_EEE_1000T respectively.
> 
> Signed-off-by: Doug Berger 

Reviewed-by: Florian Fainelli 
-- 
Florian

[PATCH net-next 02/12] net: phy: bcm7xxx: add support for 28nm EPHY

2017-03-13 Thread Doug Berger

This commit adds support for the internal fast ethernet 10/100 PHY
found in the BCM7260, BCM7268, and BCM7271 devices.

Signed-off-by: Doug Berger 
---
 drivers/net/phy/bcm7xxx.c | 215 +-
 include/linux/brcmphy.h   |   3 +
 2 files changed, 216 insertions(+), 2 deletions(-)

diff --git a/drivers/net/phy/bcm7xxx.c b/drivers/net/phy/bcm7xxx.c
index d1c2614dad3a..caa9f6e17f34 100644
--- a/drivers/net/phy/bcm7xxx.c
+++ b/drivers/net/phy/bcm7xxx.c
@@ -1,7 +1,7 @@
 /*
  * Broadcom BCM7xxx internal transceivers support.
  *
- * Copyright (C) 2014, Broadcom Corporation
+ * Copyright (C) 2014-2017 Broadcom
  *
  * This program is free software; you can redistribute it and/or
  * modify it under the terms of the GNU General Public License
@@ -19,7 +19,7 @@
 
 /* Broadcom BCM7xxx internal PHY registers */
 
-/* 40nm only register definitions */
+/* EPHY only register definitions */
 #define MII_BCM7XXX_100TX_AUX_CTL  0x10
 #define MII_BCM7XXX_100TX_FALSE_CAR0x13
 #define MII_BCM7XXX_100TX_DISC 0x14
@@ -27,6 +27,19 @@
 #define  MII_BCM7XXX_64CLK_MDIOBIT(12)
 #define MII_BCM7XXX_TEST   0x1f
 #define  MII_BCM7XXX_SHD_MODE_2BIT(2)
+#define MII_BCM7XXX_SHD_2_ADDR_CTRL0xe
+#define MII_BCM7XXX_SHD_2_CTRL_STAT0xf
+#define MII_BCM7XXX_SHD_2_BIAS_TRIM0x1a
+#define MII_BCM7XXX_SHD_3_AN_EEE_ADV   0x3
+#define MII_BCM7XXX_SHD_3_PCS_CTRL_2   0x6
+#define  MII_BCM7XXX_PCS_CTRL_2_DEF0x4400
+#define MII_BCM7XXX_SHD_3_AN_STAT  0xb
+#define  MII_BCM7XXX_AN_NULL_MSG_ENBIT(0)
+#define  MII_BCM7XXX_AN_EEE_EN BIT(1)
+#define MII_BCM7XXX_SHD_3_EEE_THRESH   0xe
+#define  MII_BCM7XXX_EEE_THRESH_DEF0x50
+#define MII_BCM7XXX_SHD_3_TL4  0x23
+#define  MII_BCM7XXX_TL4_RST_MSK   (BIT(2) | BIT(1))
 
 /* 28nm only register definitions */
 #define MISC_ADDR(base, channel)   base, channel
@@ -286,6 +299,181 @@ static int phy_set_clr_bits(struct phy_device *dev, int 
location,
return v;
 }
 
+static int bcm7xxx_28nm_ephy_01_afe_config_init(struct phy_device *phydev)
+{
+   int ret;
+
+   /* set shadow mode 2 */
+   ret = phy_set_clr_bits(phydev, MII_BCM7XXX_TEST,
+  MII_BCM7XXX_SHD_MODE_2, 0);
+   if (ret < 0)
+   return ret;
+
+   /* Set current trim values INT_trim = -1, Ext_trim =0 */
+   ret = phy_write(phydev, MII_BCM7XXX_SHD_2_BIAS_TRIM, 0x3BE0);
+   if (ret < 0)
+   goto reset_shadow_mode;
+
+   /* Cal reset */
+   ret = phy_write(phydev, MII_BCM7XXX_SHD_2_ADDR_CTRL,
+   MII_BCM7XXX_SHD_3_TL4);
+   if (ret < 0)
+   goto reset_shadow_mode;
+   ret = phy_set_clr_bits(phydev, MII_BCM7XXX_SHD_2_CTRL_STAT,
+  MII_BCM7XXX_TL4_RST_MSK, 0);
+   if (ret < 0)
+   goto reset_shadow_mode;
+
+   /* Cal reset disable */
+   ret = phy_write(phydev, MII_BCM7XXX_SHD_2_ADDR_CTRL,
+   MII_BCM7XXX_SHD_3_TL4);
+   if (ret < 0)
+   goto reset_shadow_mode;
+   ret = phy_set_clr_bits(phydev, MII_BCM7XXX_SHD_2_CTRL_STAT,
+  0, MII_BCM7XXX_TL4_RST_MSK);
+   if (ret < 0)
+   goto reset_shadow_mode;
+
+reset_shadow_mode:
+   /* reset shadow mode 2 */
+   ret = phy_set_clr_bits(phydev, MII_BCM7XXX_TEST, 0,
+  MII_BCM7XXX_SHD_MODE_2);
+   if (ret < 0)
+   return ret;
+
+   return 0;
+}
+
+/* The 28nm EPHY does not support Clause 45 (MMD) used by bcm-phy-lib */
+static int bcm7xxx_28nm_ephy_apd_enable(struct phy_device *phydev)
+{
+   int ret;
+
+   /* set shadow mode 1 */
+   ret = phy_set_clr_bits(phydev, MII_BRCM_FET_BRCMTEST,
+  MII_BRCM_FET_BT_SRE, 0);
+   if (ret < 0)
+   return ret;
+
+   /* Enable auto-power down */
+   ret = phy_set_clr_bits(phydev, MII_BRCM_FET_SHDW_AUXSTAT2,
+  MII_BRCM_FET_SHDW_AS2_APDE, 0);
+   if (ret < 0)
+   return ret;
+
+   /* reset shadow mode 1 */
+   ret = phy_set_clr_bits(phydev, MII_BRCM_FET_BRCMTEST, 0,
+  MII_BRCM_FET_BT_SRE);
+   if (ret < 0)
+   return ret;
+
+   return 0;
+}
+
+static int bcm7xxx_28nm_ephy_eee_enable(struct phy_device *phydev)
+{
+   int ret;
+
+   /* set shadow mode 2 */
+   ret = phy_set_clr_bits(phydev, MII_BCM7XXX_TEST,
+  MII_BCM7XXX_SHD_MODE_2, 0);
+   if (ret < 0)
+   return ret;
+
+   /* Advertise supported modes */
+   ret = phy_write(phydev, MII_BCM7XXX_SHD_2_ADDR_CTRL,
+   MII_BCM7XXX_SHD_3_AN_EEE_ADV);
+   if (ret < 0)
+   goto reset_shadow_mode;
+   ret = phy_write(phydev, MII_BCM7XXX_SHD_2_CTRL_STAT,
+   MDIO_EEE_100TX);
+

[PATCH net-next 00/12] net: bcmgenet: add support for GENETv5

2017-03-13 Thread Doug Berger

This collection of patches contains changes related to adding
support for the BCM7260, BCM7268, and BCM7271 devices that
contain a new version of the GENET MAC IP block (v5) and a new
fast ethernet (10/100BASE-T) internal PHY.

These patches were originally developed on top of the bug fixes
of the "[PATCH v2 net 0/8] net: bcmgenet: minor bug fixes" patch
set previously accepted into the net repository, but this
submission is designed to be applied to the current net-next
that does not yet include them. As a result there will be some
merge conflicts that I would be happy to help resolve if desired.

Specifically, conflicts should occur with these patches from the
minor bug fixes set:
[PATCH v2 net 3/8] net: bcmgenet: reserved phy revisions must be checked first
[PATCH v2 net 5/8] net: bcmgenet: synchronize irq0 status between the isr and 
task
[PATCH v2 net 8/8] net: bcmgenet: decouple flow control from bcmgenet_tx_reclaim

Doug Berger (12):
  net: phy: bcm-phylib: replace obsolete EEE macro references
  net: phy: bcm7xxx: add support for 28nm EPHY
  net: bcmgenet: simplify circular pointer arithmetic
  net: bcmgenet: remove meaningless lines
  net: bcmgenet: manage dma interrupts in napi code
  net: bcmgenet: remove handling of wol interrupts from isr0
  net: bcmgenet: clear status to reduce spurious interrupts
  net: bcmgenet: correct return value of __bcmgenet_tx_reclaim
  net: bcmgenet: return EOPNOTSUPP for unknown ioctl commands
  dt-bindings: net: document bcmgenet WoL interrupt
  dt-bindings: net: update bcmgenet binding for GENETv5
  net: bcmgenet: add support for the GENETv5 hardware

 .../devicetree/bindings/net/brcm,bcmgenet.txt  |  19 +-
 .../devicetree/bindings/net/brcm,unimac-mdio.txt   |   5 +-
 drivers/net/ethernet/broadcom/genet/bcmgenet.c | 214 +++-
 drivers/net/ethernet/broadcom/genet/bcmgenet.h |  10 +-
 drivers/net/ethernet/broadcom/genet/bcmgenet_wol.c |  13 --
 drivers/net/ethernet/broadcom/genet/bcmmii.c   |  62 +++---
 drivers/net/phy/bcm-phy-lib.c  |   6 +-
 drivers/net/phy/bcm7xxx.c  | 215 -
 include/linux/brcmphy.h|   3 +
 9 files changed, 403 insertions(+), 144 deletions(-)

-- 
2.11.1

[PATCH net-next 03/12] net: bcmgenet: simplify circular pointer arithmetic

2017-03-13 Thread Doug Berger

A 2's complement subtraction will always do a borrow, so masking
off the sign bits is the same as conditionally adding (mask+1).

Signed-off-by: Doug Berger 
---
 drivers/net/ethernet/broadcom/genet/bcmgenet.c | 19 +--
 1 file changed, 5 insertions(+), 14 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.c 
b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
index f92896835d2a..2c008b09c4e3 100644
--- a/drivers/net/ethernet/broadcom/genet/bcmgenet.c
+++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
@@ -1,7 +1,7 @@
 /*
  * Broadcom GENET (Gigabit Ethernet) controller driver
  *
- * Copyright (c) 2014 Broadcom Corporation
+ * Copyright (c) 2014-2017 Broadcom
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License version 2 as
@@ -1175,13 +1175,9 @@ static unsigned int __bcmgenet_tx_reclaim(struct 
net_device *dev,
unsigned int txbds_processed = 0;
 
/* Compute how many buffers are transmitted since last xmit call */
-   c_index = bcmgenet_tdma_ring_readl(priv, ring->index, TDMA_CONS_INDEX);
-   c_index &= DMA_C_INDEX_MASK;
-
-   if (likely(c_index >= ring->c_index))
-   txbds_ready = c_index - ring->c_index;
-   else
-   txbds_ready = (DMA_C_INDEX_MASK + 1) - ring->c_index + c_index;
+   c_index = bcmgenet_tdma_ring_readl(priv, ring->index, TDMA_CONS_INDEX)
+   & DMA_C_INDEX_MASK;
+   txbds_ready = (c_index - ring->c_index) & DMA_C_INDEX_MASK;
 
netif_dbg(priv, tx_done, dev,
  "%s ring=%d old_c_index=%u c_index=%u txbds_ready=%u\n",
@@ -1611,12 +1607,7 @@ static unsigned int bcmgenet_desc_rx(struct 
bcmgenet_rx_ring *ring,
}
 
p_index &= DMA_P_INDEX_MASK;
-
-   if (likely(p_index >= ring->c_index))
-   rxpkttoprocess = p_index - ring->c_index;
-   else
-   rxpkttoprocess = (DMA_C_INDEX_MASK + 1) - ring->c_index +
-p_index;
+   rxpkttoprocess = (p_index - ring->c_index) & DMA_C_INDEX_MASK;
 
netif_dbg(priv, rx_status, dev,
  "RDMA: rxpkttoprocess=%d\n", rxpkttoprocess);
-- 
2.11.1

[PATCH net-next 01/12] net: phy: bcm-phylib: replace obsolete EEE macro references

2017-03-13 Thread Doug Berger

The macros MDIO_AN_EEE_ADV_100TX and MDIO_AN_EEE_ADV_1000T are now
considered obsolete and are replaced in the kernel with the generic
macros MDIO_EEE_100TX and MDIO_EEE_1000T respectively.

Signed-off-by: Doug Berger 
---
 drivers/net/phy/bcm-phy-lib.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/phy/bcm-phy-lib.c b/drivers/net/phy/bcm-phy-lib.c
index ab9ad689617c..9656dbeb5de5 100644
--- a/drivers/net/phy/bcm-phy-lib.c
+++ b/drivers/net/phy/bcm-phy-lib.c
@@ -1,5 +1,5 @@
 /*
- * Copyright (C) 2015 Broadcom Corporation
+ * Copyright (C) 2015-2017 Broadcom
  *
  * This program is free software; you can redistribute it and/or
  * modify it under the terms of the GNU General Public License as
@@ -221,9 +221,9 @@ int bcm_phy_set_eee(struct phy_device *phydev, bool enable)
return val;
 
if (enable)
-   val |= (MDIO_AN_EEE_ADV_100TX | MDIO_AN_EEE_ADV_1000T);
+   val |= (MDIO_EEE_100TX | MDIO_EEE_1000T);
else
-   val &= ~(MDIO_AN_EEE_ADV_100TX | MDIO_AN_EEE_ADV_1000T);
+   val &= ~(MDIO_EEE_100TX | MDIO_EEE_1000T);
 
phy_write_mmd_indirect(phydev, BCM_CL45VEN_EEE_ADV,
   MDIO_MMD_AN, (u32)val);
-- 
2.11.1

[PATCH net-next 06/12] net: bcmgenet: remove handling of wol interrupts from isr0

2017-03-13 Thread Doug Berger

The bcmgenet_wol_isr() handler performs the necessary processing for
waking from a GENET event.  There is no necessary functionality behind
servicing the UMAC_IRQ_MPD_R event in the handling of isr0.  Therefore
the code that unmasks and masks this interrupt and that gets invoked
in response to it is removed by this commit.

Signed-off-by: Doug Berger 
---
 drivers/net/ethernet/broadcom/genet/bcmgenet.c | 10 +-
 drivers/net/ethernet/broadcom/genet/bcmgenet_wol.c | 15 +--
 2 files changed, 2 insertions(+), 23 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.c 
b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
index 9be884021679..661ca1b39c89 100644
--- a/drivers/net/ethernet/broadcom/genet/bcmgenet.c
+++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
@@ -2455,13 +2455,6 @@ static void bcmgenet_irq_task(struct work_struct *work)
 
netif_dbg(priv, intr, priv->dev, "%s\n", __func__);
 
-   if (priv->irq0_stat & UMAC_IRQ_MPD_R) {
-   priv->irq0_stat &= ~UMAC_IRQ_MPD_R;
-   netif_dbg(priv, wol, priv->dev,
- "magic packet detected, waking up\n");
-   bcmgenet_power_up(priv, GENET_POWER_WOL_MAGIC);
-   }
-
/* Link UP/DOWN event */
if (priv->irq0_stat & UMAC_IRQ_LINK_EVENT) {
phy_mac_interrupt(priv->phydev,
@@ -2558,8 +2551,7 @@ static irqreturn_t bcmgenet_isr0(int irq, void *dev_id)
UMAC_IRQ_PHY_DET_F |
UMAC_IRQ_LINK_EVENT |
UMAC_IRQ_HFB_SM |
-   UMAC_IRQ_HFB_MM |
-   UMAC_IRQ_MPD_R)) {
+   UMAC_IRQ_HFB_MM)) {
/* all other interested interrupts handled in bottom half */
schedule_work(>bcmgenet_irq_work);
}
diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet_wol.c 
b/drivers/net/ethernet/broadcom/genet/bcmgenet_wol.c
index b97122926d3a..2fbd027f0148 100644
--- a/drivers/net/ethernet/broadcom/genet/bcmgenet_wol.c
+++ b/drivers/net/ethernet/broadcom/genet/bcmgenet_wol.c
@@ -1,7 +1,7 @@
 /*
  * Broadcom GENET (Gigabit Ethernet) Wake-on-LAN support
  *
- * Copyright (c) 2014 Broadcom Corporation
+ * Copyright (c) 2014-2017 Broadcom
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License version 2 as
@@ -127,7 +127,6 @@ int bcmgenet_wol_power_down_cfg(struct bcmgenet_priv *priv,
enum bcmgenet_power_mode mode)
 {
struct net_device *dev = priv->dev;
-   u32 cpu_mask_clear;
int retries = 0;
u32 reg;
 
@@ -173,18 +172,12 @@ int bcmgenet_wol_power_down_cfg(struct bcmgenet_priv 
*priv,
bcmgenet_ext_writel(priv, reg, EXT_EXT_PWR_MGMT);
}
 
-   /* Enable the MPD interrupt */
-   cpu_mask_clear = UMAC_IRQ_MPD_R;
-
-   bcmgenet_intrl2_0_writel(priv, cpu_mask_clear, INTRL2_CPU_MASK_CLEAR);
-
return 0;
 }
 
 void bcmgenet_wol_power_up_cfg(struct bcmgenet_priv *priv,
   enum bcmgenet_power_mode mode)
 {
-   u32 cpu_mask_set;
u32 reg;
 
if (mode != GENET_POWER_WOL_MAGIC) {
@@ -201,10 +194,4 @@ void bcmgenet_wol_power_up_cfg(struct bcmgenet_priv *priv,
reg &= ~CMD_CRC_FWD;
bcmgenet_umac_writel(priv, reg, UMAC_CMD);
priv->crc_fwd_en = 0;
-
-   /* Stop monitoring magic packet IRQ */
-   cpu_mask_set = UMAC_IRQ_MPD_R;
-
-   /* Stop monitoring magic packet IRQ */
-   bcmgenet_intrl2_0_writel(priv, cpu_mask_set, INTRL2_CPU_MASK_SET);
 }
-- 
2.11.1

[PATCH net-next 04/12] net: bcmgenet: remove meaningless lines

2017-03-13 Thread Doug Berger

An assortment of non-functional lines are removed to reduce confusion
and some typos in comments are corrected.

Signed-off-by: Doug Berger 
---
 drivers/net/ethernet/broadcom/genet/bcmgenet.c | 13 +++--
 1 file changed, 3 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.c 
b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
index 2c008b09c4e3..22c92f5a9829 100644
--- a/drivers/net/ethernet/broadcom/genet/bcmgenet.c
+++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
@@ -605,7 +605,7 @@ static int bcmgenet_set_coalesce(struct net_device *dev,
 
/* GENET TDMA hardware does not support a configurable timeout, but will
 * always generate an interrupt either after MBDONE packets have been
-* transmitted, or when the ring is emtpy.
+* transmitted, or when the ring is empty.
 */
if (ec->tx_coalesce_usecs || ec->tx_coalesce_usecs_high ||
ec->tx_coalesce_usecs_irq || ec->tx_coalesce_usecs_low)
@@ -1834,10 +1834,8 @@ static void bcmgenet_intr_disable(struct bcmgenet_priv 
*priv)
/* Mask all interrupts.*/
bcmgenet_intrl2_0_writel(priv, 0x, INTRL2_CPU_MASK_SET);
bcmgenet_intrl2_0_writel(priv, 0x, INTRL2_CPU_CLEAR);
-   bcmgenet_intrl2_0_writel(priv, 0, INTRL2_CPU_MASK_CLEAR);
bcmgenet_intrl2_1_writel(priv, 0x, INTRL2_CPU_MASK_SET);
bcmgenet_intrl2_1_writel(priv, 0x, INTRL2_CPU_CLEAR);
-   bcmgenet_intrl2_1_writel(priv, 0, INTRL2_CPU_MASK_CLEAR);
 }
 
 static void bcmgenet_link_intr_enable(struct bcmgenet_priv *priv)
@@ -1926,7 +1924,6 @@ static int init_umac(struct bcmgenet_priv *priv)
bcmgenet_intrl2_0_writel(priv, int0_enable, INTRL2_CPU_MASK_CLEAR);
bcmgenet_intrl2_1_writel(priv, int1_enable, INTRL2_CPU_MASK_CLEAR);
 
-   /* Enable rx/tx engine.*/
dev_dbg(kdev, "done init umac\n");
 
return 0;
@@ -2836,7 +2833,7 @@ static int bcmgenet_close(struct net_device *dev)
if (ret)
return ret;
 
-   /* Disable MAC transmit. TX DMA disabled have to done before this */
+   /* Disable MAC transmit. TX DMA disabled must be done before this */
umac_enable_set(priv, CMD_TX_EN, false);
 
/* tx reclaim */
@@ -3115,22 +3112,18 @@ static void bcmgenet_set_hw_params(struct bcmgenet_priv 
*priv)
bcmgenet_dma_regs = bcmgenet_dma_regs_v3plus;
genet_dma_ring_regs = genet_dma_ring_regs_v4;
priv->dma_rx_chk_bit = DMA_RX_CHK_V3PLUS;
-   priv->version = GENET_V4;
} else if (GENET_IS_V3(priv)) {
bcmgenet_dma_regs = bcmgenet_dma_regs_v3plus;
genet_dma_ring_regs = genet_dma_ring_regs_v123;
priv->dma_rx_chk_bit = DMA_RX_CHK_V3PLUS;
-   priv->version = GENET_V3;
} else if (GENET_IS_V2(priv)) {
bcmgenet_dma_regs = bcmgenet_dma_regs_v2;
genet_dma_ring_regs = genet_dma_ring_regs_v123;
priv->dma_rx_chk_bit = DMA_RX_CHK_V12;
-   priv->version = GENET_V2;
} else if (GENET_IS_V1(priv)) {
bcmgenet_dma_regs = bcmgenet_dma_regs_v1;
genet_dma_ring_regs = genet_dma_ring_regs_v123;
priv->dma_rx_chk_bit = DMA_RX_CHK_V12;
-   priv->version = GENET_V1;
}
 
/* enum genet_version starts at 1 */
@@ -3397,7 +3390,7 @@ static int bcmgenet_suspend(struct device *d)
if (ret)
return ret;
 
-   /* Disable MAC transmit. TX DMA disabled have to done before this */
+   /* Disable MAC transmit. TX DMA disabled must be done before this */
umac_enable_set(priv, CMD_TX_EN, false);
 
/* tx reclaim */
-- 
2.11.1

[PATCH net-next 11/12] dt-bindings: net: update bcmgenet binding for GENETv5

2017-03-13 Thread Doug Berger

The device tree documentation must be updated to reflect the new compatible
strings "brcm,genet-v5" and "brcm,genet-mdio-v5" used by the GENETv5 driver.

Signed-off-by: Doug Berger 
---
 Documentation/devicetree/bindings/net/brcm,bcmgenet.txt| 10 +-
 Documentation/devicetree/bindings/net/brcm,unimac-mdio.txt |  5 +++--
 2 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/Documentation/devicetree/bindings/net/brcm,bcmgenet.txt 
b/Documentation/devicetree/bindings/net/brcm,bcmgenet.txt
index 01a70463cbc5..26c77d985faf 100644
--- a/Documentation/devicetree/bindings/net/brcm,bcmgenet.txt
+++ b/Documentation/devicetree/bindings/net/brcm,bcmgenet.txt
@@ -2,7 +2,7 @@
 
 Required properties:
 - compatible: should contain one of "brcm,genet-v1", "brcm,genet-v2",
-  "brcm,genet-v3", "brcm,genet-v4".
+  "brcm,genet-v3", "brcm,genet-v4", "brcm,genet-v5".
 - reg: address and length of the register set for the device
 - interrupts and/or interrupts-extended: must be two cells, the first cell
   is the general purpose interrupt line, while the second cell is the
@@ -32,15 +32,15 @@ Optional properties:
 
 Required child nodes:
 
-- mdio bus node: this node should always be present regarless of the PHY
+- mdio bus node: this node should always be present regardless of the PHY
   configuration of the GENET instance
 
 MDIO bus node required properties:
 
 - compatible: should contain one of "brcm,genet-mdio-v1", "brcm,genet-mdio-v2"
-  "brcm,genet-mdio-v3", "brcm,genet-mdio-v4", the version has to match the
-  parent node compatible property (e.g: brcm,genet-v4 pairs with
-  brcm,genet-mdio-v4)
+  "brcm,genet-mdio-v3", "brcm,genet-mdio-v4", "brcm,genet-mdio-v5", the version
+  has to match the parent node compatible property (e.g: brcm,genet-v4 pairs
+  with brcm,genet-mdio-v4)
 - reg: address and length relative to the parent node base register address
 - #address-cells: address cell for MDIO bus addressing, should be 1
 - #size-cells: size of the cells for MDIO bus addressing, should be 0
diff --git a/Documentation/devicetree/bindings/net/brcm,unimac-mdio.txt 
b/Documentation/devicetree/bindings/net/brcm,unimac-mdio.txt
index ab0bb4247d14..4648948f7c3b 100644
--- a/Documentation/devicetree/bindings/net/brcm,unimac-mdio.txt
+++ b/Documentation/devicetree/bindings/net/brcm,unimac-mdio.txt
@@ -2,8 +2,9 @@
 
 Required properties:
 - compatible: should one from "brcm,genet-mdio-v1", "brcm,genet-mdio-v2",
-  "brcm,genet-mdio-v3", "brcm,genet-mdio-v4" or "brcm,unimac-mdio"
-- reg: address and length of the regsiter set for the device, first one is the
+  "brcm,genet-mdio-v3", "brcm,genet-mdio-v4", "brcm,genet-mdio-v5" or
+  "brcm,unimac-mdio"
+- reg: address and length of the register set for the device, first one is the
   base register, and the second one is optional and for indirect accesses to
   larger than 16-bits MDIO transactions
 - reg-names: name(s) of the register must be "mdio" and optional 
"mdio_indir_rw"
-- 
2.11.1

[PATCH net-next 05/12] net: bcmgenet: manage dma interrupts in napi code

2017-03-13 Thread Doug Berger

This commit moves DMA interrupt enabling out of init_umac() and adds
the masking of these interrupts to the napi enable and disable code.

Signed-off-by: Doug Berger 
---
 drivers/net/ethernet/broadcom/genet/bcmgenet.c | 39 +++---
 1 file changed, 22 insertions(+), 17 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.c 
b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
index 22c92f5a9829..9be884021679 100644
--- a/drivers/net/ethernet/broadcom/genet/bcmgenet.c
+++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
@@ -1862,8 +1862,6 @@ static int init_umac(struct bcmgenet_priv *priv)
int ret;
u32 reg;
u32 int0_enable = 0;
-   u32 int1_enable = 0;
-   int i;
 
dev_dbg(>pdev->dev, "bcmgenet: init_umac\n");
 
@@ -1890,12 +1888,6 @@ static int init_umac(struct bcmgenet_priv *priv)
 
bcmgenet_intr_disable(priv);
 
-   /* Enable Rx default queue 16 interrupts */
-   int0_enable |= UMAC_IRQ_RXDMA_DONE;
-
-   /* Enable Tx default queue 16 interrupts */
-   int0_enable |= UMAC_IRQ_TXDMA_DONE;
-
/* Configure backpressure vectors for MoCA */
if (priv->phy_interface == PHY_INTERFACE_MODE_MOCA) {
reg = bcmgenet_bp_mc_get(priv);
@@ -1913,16 +1905,7 @@ static int init_umac(struct bcmgenet_priv *priv)
if (priv->hw_params->flags & GENET_HAS_MDIO_INTR)
int0_enable |= (UMAC_IRQ_MDIO_DONE | UMAC_IRQ_MDIO_ERROR);
 
-   /* Enable Rx priority queue interrupts */
-   for (i = 0; i < priv->hw_params->rx_queues; ++i)
-   int1_enable |= (1 << (UMAC_IRQ1_RX_INTR_SHIFT + i));
-
-   /* Enable Tx priority queue interrupts */
-   for (i = 0; i < priv->hw_params->tx_queues; ++i)
-   int1_enable |= (1 << i);
-
bcmgenet_intrl2_0_writel(priv, int0_enable, INTRL2_CPU_MASK_CLEAR);
-   bcmgenet_intrl2_1_writel(priv, int1_enable, INTRL2_CPU_MASK_CLEAR);
 
dev_dbg(kdev, "done init umac\n");
 
@@ -2055,22 +2038,33 @@ static void bcmgenet_init_tx_napi(struct bcmgenet_priv 
*priv)
 static void bcmgenet_enable_tx_napi(struct bcmgenet_priv *priv)
 {
unsigned int i;
+   u32 int0_enable = UMAC_IRQ_TXDMA_DONE;
+   u32 int1_enable = 0;
struct bcmgenet_tx_ring *ring;
 
for (i = 0; i < priv->hw_params->tx_queues; ++i) {
ring = >tx_rings[i];
napi_enable(>napi);
+   int1_enable |= (1 << i);
}
 
ring = >tx_rings[DESC_INDEX];
napi_enable(>napi);
+
+   bcmgenet_intrl2_0_writel(priv, int0_enable, INTRL2_CPU_MASK_CLEAR);
+   bcmgenet_intrl2_1_writel(priv, int1_enable, INTRL2_CPU_MASK_CLEAR);
 }
 
 static void bcmgenet_disable_tx_napi(struct bcmgenet_priv *priv)
 {
unsigned int i;
+   u32 int0_disable = UMAC_IRQ_TXDMA_DONE;
+   u32 int1_disable = 0x;
struct bcmgenet_tx_ring *ring;
 
+   bcmgenet_intrl2_0_writel(priv, int0_disable, INTRL2_CPU_MASK_SET);
+   bcmgenet_intrl2_1_writel(priv, int1_disable, INTRL2_CPU_MASK_SET);
+
for (i = 0; i < priv->hw_params->tx_queues; ++i) {
ring = >tx_rings[i];
napi_disable(>napi);
@@ -2183,22 +2177,33 @@ static void bcmgenet_init_rx_napi(struct bcmgenet_priv 
*priv)
 static void bcmgenet_enable_rx_napi(struct bcmgenet_priv *priv)
 {
unsigned int i;
+   u32 int0_enable = UMAC_IRQ_RXDMA_DONE;
+   u32 int1_enable = 0;
struct bcmgenet_rx_ring *ring;
 
for (i = 0; i < priv->hw_params->rx_queues; ++i) {
ring = >rx_rings[i];
napi_enable(>napi);
+   int1_enable |= (1 << (UMAC_IRQ1_RX_INTR_SHIFT + i));
}
 
ring = >rx_rings[DESC_INDEX];
napi_enable(>napi);
+
+   bcmgenet_intrl2_0_writel(priv, int0_enable, INTRL2_CPU_MASK_CLEAR);
+   bcmgenet_intrl2_1_writel(priv, int1_enable, INTRL2_CPU_MASK_CLEAR);
 }
 
 static void bcmgenet_disable_rx_napi(struct bcmgenet_priv *priv)
 {
unsigned int i;
+   u32 int0_disable = UMAC_IRQ_RXDMA_DONE;
+   u32 int1_disable = 0x << UMAC_IRQ1_RX_INTR_SHIFT;
struct bcmgenet_rx_ring *ring;
 
+   bcmgenet_intrl2_0_writel(priv, int0_disable, INTRL2_CPU_MASK_SET);
+   bcmgenet_intrl2_1_writel(priv, int1_disable, INTRL2_CPU_MASK_SET);
+
for (i = 0; i < priv->hw_params->rx_queues; ++i) {
ring = >rx_rings[i];
napi_disable(>napi);
-- 
2.11.1

[PATCH net-next 10/12] dt-bindings: net: document bcmgenet WoL interrupt

2017-03-13 Thread Doug Berger

A third interrupt cell can be provided to optionally specify
the interrupt used for handling Wake on LAN events.

Typically the wake up handling uses a separate interrupt
controller, so the interrupts-extended property is used to
accommodate this.

Signed-off-by: Doug Berger 
---
 Documentation/devicetree/bindings/net/brcm,bcmgenet.txt | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/Documentation/devicetree/bindings/net/brcm,bcmgenet.txt 
b/Documentation/devicetree/bindings/net/brcm,bcmgenet.txt
index 10587bdadbbe..01a70463cbc5 100644
--- a/Documentation/devicetree/bindings/net/brcm,bcmgenet.txt
+++ b/Documentation/devicetree/bindings/net/brcm,bcmgenet.txt
@@ -4,9 +4,12 @@ Required properties:
 - compatible: should contain one of "brcm,genet-v1", "brcm,genet-v2",
   "brcm,genet-v3", "brcm,genet-v4".
 - reg: address and length of the register set for the device
-- interrupts: must be two cells, the first cell is the general purpose
-  interrupt line, while the second cell is the interrupt for the ring
-  RX and TX queues operating in ring mode
+- interrupts and/or interrupts-extended: must be two cells, the first cell
+  is the general purpose interrupt line, while the second cell is the
+  interrupt for the ring RX and TX queues operating in ring mode.  An
+  optional third interrupt cell for Wake-on-LAN can be specified.
+  See Documentation/devicetree/bindings/interrupt-controller/interrupts.txt
+  for information on the property specifics.
 - phy-mode: see ethernet.txt file in the same directory
 - #address-cells: should be 1
 - #size-cells: should be 1
-- 
2.11.1

[PATCH net-next 07/12] net: bcmgenet: clear status to reduce spurious interrupts

2017-03-13 Thread Doug Berger

Since the DMA interrupt status is latched and the DMA servicing can be
polled, it is a good idea to clear the latched status of a DMA interrupt
before performing the service that would be invoked by the interrupt.

This prevents old status from causing spurious interrupts when the
interrupt is unmasked at a later time.

Signed-off-by: Doug Berger 
---
 drivers/net/ethernet/broadcom/genet/bcmgenet.c | 21 -
 1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.c 
b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
index 661ca1b39c89..1f94ba1773dd 100644
--- a/drivers/net/ethernet/broadcom/genet/bcmgenet.c
+++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
@@ -1174,6 +1174,14 @@ static unsigned int __bcmgenet_tx_reclaim(struct 
net_device *dev,
unsigned int txbds_ready;
unsigned int txbds_processed = 0;
 
+   /* Clear status before servicing to reduce spurious interrupts */
+   if (ring->index == DESC_INDEX)
+   bcmgenet_intrl2_0_writel(priv, UMAC_IRQ_TXDMA_DONE,
+INTRL2_CPU_CLEAR);
+   else
+   bcmgenet_intrl2_1_writel(priv, (1 << ring->index),
+INTRL2_CPU_CLEAR);
+
/* Compute how many buffers are transmitted since last xmit call */
c_index = bcmgenet_tdma_ring_readl(priv, ring->index, TDMA_CONS_INDEX)
& DMA_C_INDEX_MASK;
@@ -1584,10 +1592,21 @@ static unsigned int bcmgenet_desc_rx(struct 
bcmgenet_rx_ring *ring,
unsigned long dma_flag;
int len;
unsigned int rxpktprocessed = 0, rxpkttoprocess;
-   unsigned int p_index;
+   unsigned int p_index, mask;
unsigned int discards;
unsigned int chksum_ok = 0;
 
+   /* Clear status before servicing to reduce spurious interrupts */
+   if (ring->index == DESC_INDEX) {
+   bcmgenet_intrl2_0_writel(priv, UMAC_IRQ_RXDMA_DONE,
+INTRL2_CPU_CLEAR);
+   } else {
+   mask = 1 << (UMAC_IRQ1_RX_INTR_SHIFT + ring->index);
+   bcmgenet_intrl2_1_writel(priv,
+mask,
+INTRL2_CPU_CLEAR);
+   }
+
p_index = bcmgenet_rdma_ring_readl(priv, ring->index, RDMA_PROD_INDEX);
 
discards = (p_index >> DMA_P_INDEX_DISCARD_CNT_SHIFT) &
-- 
2.11.1

[PATCH net-next 12/12] net: bcmgenet: add support for the GENETv5 hardware

2017-03-13 Thread Doug Berger

This commit adds support for the GENETv5 implementation.

The GENETv5 reports a major version of 6 instead of 5 so compensate
for this when verifying the configuration of the driver.  Also the
EPHY revision is now contained in the MDIO registers of the PHY so
the EPHY revision of 0 in GENET_VER_FMT is expected for GENETv5.

Signed-off-by: Doug Berger 
---
 drivers/net/ethernet/broadcom/genet/bcmgenet.c | 91 --
 drivers/net/ethernet/broadcom/genet/bcmgenet.h | 12 +++-
 drivers/net/ethernet/broadcom/genet/bcmmii.c   | 62 ++
 drivers/net/phy/mdio-bcm-unimac.c  |  3 +-
 4 files changed, 118 insertions(+), 50 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.c 
b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
index 3b49c14128e2..d848ac58189c 100644
--- a/drivers/net/ethernet/broadcom/genet/bcmgenet.c
+++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
@@ -1011,8 +1011,17 @@ static int bcmgenet_power_down(struct bcmgenet_priv 
*priv,
/* Power down LED */
if (priv->hw_params->flags & GENET_HAS_EXT) {
reg = bcmgenet_ext_readl(priv, EXT_EXT_PWR_MGMT);
-   reg |= (EXT_PWR_DOWN_PHY |
-   EXT_PWR_DOWN_DLL | EXT_PWR_DOWN_BIAS);
+   if (GENET_IS_V5(priv))
+   reg |= EXT_PWR_DOWN_PHY_EN |
+  EXT_PWR_DOWN_PHY_RD |
+  EXT_PWR_DOWN_PHY_SD |
+  EXT_PWR_DOWN_PHY_RX |
+  EXT_PWR_DOWN_PHY_TX |
+  EXT_IDDQ_GLBL_PWR;
+   else
+   reg |= EXT_PWR_DOWN_PHY;
+
+   reg |= (EXT_PWR_DOWN_DLL | EXT_PWR_DOWN_BIAS);
bcmgenet_ext_writel(priv, reg, EXT_EXT_PWR_MGMT);
 
bcmgenet_phy_power_set(priv->dev, false);
@@ -1037,12 +1046,34 @@ static void bcmgenet_power_up(struct bcmgenet_priv 
*priv,
 
switch (mode) {
case GENET_POWER_PASSIVE:
-   reg &= ~(EXT_PWR_DOWN_DLL | EXT_PWR_DOWN_PHY |
-   EXT_PWR_DOWN_BIAS);
-   /* fallthrough */
+   reg &= ~(EXT_PWR_DOWN_DLL | EXT_PWR_DOWN_BIAS);
+   if (GENET_IS_V5(priv)) {
+   reg &= ~(EXT_PWR_DOWN_PHY_EN |
+EXT_PWR_DOWN_PHY_RD |
+EXT_PWR_DOWN_PHY_SD |
+EXT_PWR_DOWN_PHY_RX |
+EXT_PWR_DOWN_PHY_TX |
+EXT_IDDQ_GLBL_PWR);
+   reg |=   EXT_PHY_RESET;
+   bcmgenet_ext_writel(priv, reg, EXT_EXT_PWR_MGMT);
+   mdelay(1);
+
+   reg &=  ~EXT_PHY_RESET;
+   } else {
+   reg &= ~EXT_PWR_DOWN_PHY;
+   reg |= EXT_PWR_DN_EN_LD;
+   }
+   bcmgenet_ext_writel(priv, reg, EXT_EXT_PWR_MGMT);
+   bcmgenet_phy_power_set(priv->dev, true);
+   bcmgenet_mii_reset(priv->dev);
+   break;
+
case GENET_POWER_CABLE_SENSE:
/* enable APD */
-   reg |= EXT_PWR_DN_EN_LD;
+   if (!GENET_IS_V5(priv)) {
+   reg |= EXT_PWR_DN_EN_LD;
+   bcmgenet_ext_writel(priv, reg, EXT_EXT_PWR_MGMT);
+   }
break;
case GENET_POWER_WOL_MAGIC:
bcmgenet_wol_power_up_cfg(priv, mode);
@@ -1050,12 +1081,6 @@ static void bcmgenet_power_up(struct bcmgenet_priv *priv,
default:
break;
}
-
-   bcmgenet_ext_writel(priv, reg, EXT_EXT_PWR_MGMT);
-   if (mode == GENET_POWER_PASSIVE) {
-   bcmgenet_phy_power_set(priv->dev, true);
-   bcmgenet_mii_reset(priv->dev);
-   }
 }
 
 /* ioctl handle special commands that are not present in ethtool. */
@@ -3101,6 +3126,25 @@ static struct bcmgenet_hw_params bcmgenet_hw_params[] = {
.flags = GENET_HAS_40BITS | GENET_HAS_EXT |
 GENET_HAS_MDIO_INTR | GENET_HAS_MOCA_LINK_DET,
},
+   [GENET_V5] = {
+   .tx_queues = 4,
+   .tx_bds_per_q = 32,
+   .rx_queues = 0,
+   .rx_bds_per_q = 0,
+   .bp_in_en_shift = 17,
+   .bp_in_mask = 0x1,
+   .hfb_filter_cnt = 48,
+   .hfb_filter_size = 128,
+   .qtag_mask = 0x3F,
+   .tbuf_offset = 0x0600,
+   .hfb_offset = 0x8000,
+   .hfb_reg_offset = 0xfc00,
+   .rdma_offset = 0x2000,
+   .tdma_offset = 0x4000,
+   .words_per_bd = 3,
+   .flags = GENET_HAS_40BITS |

[PATCH net-next 09/12] net: bcmgenet: return EOPNOTSUPP for unknown ioctl commands

2017-03-13 Thread Doug Berger

This commit changes the ioctl handling behavior to return the
EOPNOTSUPP error code instead of the EINVAL error code when an
unknown ioctl command value is detected.

It also removes some redundant parsing of the ioctl command value
and allows the SIOCSHWTSTAMP value to be handled.

Signed-off-by: Doug Berger 
---
 drivers/net/ethernet/broadcom/genet/bcmgenet.c | 19 +++
 1 file changed, 3 insertions(+), 16 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.c 
b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
index d90d366b286f..3b49c14128e2 100644
--- a/drivers/net/ethernet/broadcom/genet/bcmgenet.c
+++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
@@ -1062,27 +1062,14 @@ static void bcmgenet_power_up(struct bcmgenet_priv 
*priv,
 static int bcmgenet_ioctl(struct net_device *dev, struct ifreq *rq, int cmd)
 {
struct bcmgenet_priv *priv = netdev_priv(dev);
-   int val = 0;
 
if (!netif_running(dev))
return -EINVAL;
 
-   switch (cmd) {
-   case SIOCGMIIPHY:
-   case SIOCGMIIREG:
-   case SIOCSMIIREG:
-   if (!priv->phydev)
-   val = -ENODEV;
-   else
-   val = phy_mii_ioctl(priv->phydev, rq, cmd);
-   break;
-
-   default:
-   val = -EINVAL;
-   break;
-   }
+   if (!priv->phydev)
+   return -ENODEV;
 
-   return val;
+   return phy_mii_ioctl(priv->phydev, rq, cmd);
 }
 
 static struct enet_cb *bcmgenet_get_txcb(struct bcmgenet_priv *priv,
-- 
2.11.1

[PATCH net-next 08/12] net: bcmgenet: correct return value of __bcmgenet_tx_reclaim

2017-03-13 Thread Doug Berger

The reclaim function should return the number of buffer descriptors
reclaimed, not just the number corresponding to skb packets.

Also, remove the unnecessary computation when updating the consumer
index.

While this is not a functional problem it could degrade performance
of napi in a fragmented transmit stream.

Signed-off-by: Doug Berger 
---
 drivers/net/ethernet/broadcom/genet/bcmgenet.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.c 
b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
index 1f94ba1773dd..d90d366b286f 100644
--- a/drivers/net/ethernet/broadcom/genet/bcmgenet.c
+++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
@@ -1218,7 +1218,7 @@ static unsigned int __bcmgenet_tx_reclaim(struct 
net_device *dev,
}
 
ring->free_bds += txbds_processed;
-   ring->c_index = (ring->c_index + txbds_processed) & DMA_C_INDEX_MASK;
+   ring->c_index = c_index;
 
dev->stats.tx_packets += pkts_compl;
dev->stats.tx_bytes += bytes_compl;
@@ -1231,7 +1231,7 @@ static unsigned int __bcmgenet_tx_reclaim(struct 
net_device *dev,
netif_tx_wake_queue(txq);
}
 
-   return pkts_compl;
+   return txbds_processed;
 }
 
 static unsigned int bcmgenet_tx_reclaim(struct net_device *dev,
-- 
2.11.1

Re: bond procfs hw addr prints

2017-03-13 Thread Jay Vosburgh

Jarod Wilson  wrote:

>I've got a bug report for someone using a Intel OPA devices in a bond, and
>it appears these devices have a hardware address length of 20, opposed to
>the typical 6 on ethernet. When they dump /proc/net/bonding/bondX, it only
>prints the first 6 of the address, per %pM and mac_address_string(), while
>sysfs for the interface does print the right thing, since it uses
>sysfs_print_mac(), which takes a length argument.

This (20 octet MAC length) is true for any Infiniband device.

>So the question is... What's the best route to take here? Expand %pM to
>support variable length hardware addresses? Use sysfs_* in procfs?
>Reinvent the wheel? Nothing I've tinkered with just yet feels very clean,
>on top of not actually working yet. :)

sysfs_format_mac (not _print_mac) uses "%*phC", len, addr in its
format string.  Perhaps that format would be a better choice than %pM
for this case?

-J

---
-Jay Vosburgh, jay.vosbu...@canonical.com

Re: [PATCH 09/11] net: stmmac: dma channel init prepared for multiple queues

2017-03-13 Thread David Miller

From: Joao Pinto 
Date: Mon, 13 Mar 2017 16:12:40 +

> diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c 
> b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> index e60e077..44db2e3 100644
> --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> @@ -1732,6 +1732,10 @@ static void stmmac_check_ether_addr(struct stmmac_priv 
> *priv)
>   */
>  static int stmmac_init_dma_engine(struct stmmac_priv *priv)
>  {
> + u32 rx_channels_count = priv->plat->rx_queues_to_use;
> + u32 tx_channels_count = priv->plat->tx_queues_to_use;
> + u32 dummy_dma_rx_phy, dummy_dma_tx_phy = 0;
> + u32 chan = 0;
>   int atds = 0;
>   int ret = 0;
>  

dummy_dma_rx_phy is declared, but not initialized:

> @@ -1749,19 +1753,43 @@ static int stmmac_init_dma_engine(struct stmmac_priv 
> *priv)
>   return ret;
>   }
>  
> - priv->hw->dma->init(priv->ioaddr, priv->plat->dma_cfg,
> - priv->dma_tx_phy, priv->dma_rx_phy, atds);
> -
>   if (priv->synopsys_id >= DWMAC_CORE_4_00) {
> - priv->rx_tail_addr = priv->dma_rx_phy +
> - (DMA_RX_SIZE * sizeof(struct dma_desc));
> - priv->hw->dma->set_rx_tail_ptr(priv->ioaddr, priv->rx_tail_addr,
> -STMMAC_CHAN0);
> + /* DMA Configuration */
> + priv->hw->dma->init(priv->ioaddr, priv->plat->dma_cfg,
> + dummy_dma_tx_phy, dummy_dma_rx_phy, atds);
> +

Yet it is used here, still uninitialized.

The compiler even warns about this.

Re: [RFC PATCH] sock: add SO_RCVQUEUE_SIZE getsockopt

2017-03-13 Thread David Miller

From: Josh Hunt 
Date: Mon, 13 Mar 2017 18:34:41 -0500

> In this particular case they really do want to know total # of bytes
> in the receive queue, not the data bytes they can consume from an
> application pov. The kernel currently only exposes this value through
> netlink or /proc/net/udp from what I saw.

Can you explain in what way this is useful?

The difference between skb->len and skb->truesize is really kernel
internal implementation detail, and I'm trying to figure out why
this would be useful to an application.

Re: [PATCH v2 net-next] qed*: Utilize Firmware 8.15.3.0

2017-03-13 Thread David Miller

From: Christoph Hellwig 
Date: Mon, 13 Mar 2017 16:19:47 -0700

> On Mon, Mar 13, 2017 at 03:33:47PM -0700, David Miller wrote:
>> Applied, thanks.
> 
> So everyone who doesn't have the very latests linux-firmware will now
> have a non-working card after upgrading the kernel?

I deeply regret that you've missed more than a decade of precedence on
this matter.

Re: [PATCH 3/7] ath9k: Add support for reading the EEPROM data using the nvmem API

2017-03-13 Thread Christian Lamparter

On Monday, March 13, 2017 10:05:11 PM CET Alban wrote:
> Currently SoC platforms use a firmware request to get the EEPROM data.
> This is mostly a hack and rely on using a user-helper scripts which is
> deprecated. A nicer alternative is to use the nvmem API which was
> designed for this kind of task.
> 
> Furthermore we let CONFIG_ATH9K_AHB select CONFIG_NVMEM as such
> devices will generally use this method for loading the EEPROM data.
> 
> Signed-off-by: Alban 
> ---
> @@ -654,6 +656,25 @@ static int ath9k_init_softc(u16 devid, struct ath_softc 
> *sc,
>   if (ret)
>   return ret;
>  
> + /* If the EEPROM hasn't been retrieved via firmware request
> +  * use the nvmem API insted.
> +  */
> + if (!ah->eeprom_blob) {
> + struct nvmem_cell *eeprom_cell;
> +
> + eeprom_cell = nvmem_cell_get(ah->dev, "eeprom");
> + if (!IS_ERR(eeprom_cell)) {
> + ah->eeprom_data = nvmem_cell_read(
> + eeprom_cell, >eeprom_size);
> + nvmem_cell_put(eeprom_cell);
> +
> + if (IS_ERR(ah->eeprom_data)) {
> + dev_err(ah->dev, "failed to read eeprom");
> + return PTR_ERR(ah->eeprom_data);
> + }
> + }
> + }
> +
>   if (ath9k_led_active_high != -1)
>   ah->config.led_active_high = ath9k_led_active_high == 1;
>  
Are you sure this works with AR93XX SoCs that have the calibration data
in the OTP?

I've added Chris to the CC, since he has a HiveAP 121 that has this
configuration, so he can test, whenever this is a problem or not.

But from what I can tell, devices with the calibration data in the
OTP would fail now. This is because the OTP is done by ath9k_hw_init()
which hasn't run yet (it's a bit further down the road in the same
function though). 

Note: the OTP doesn't store the whole caldata. Just a few parts.
A temporary "eeprom" gets constructed with the help of the 
ar9300_eep_templates in ar9003_eeprom.c's
ar9300_eeprom_restore_internal(). But from what I don't think, 
that the eeprom_blob is constructed/set by the functions in
ar9003_eeprom.

I think this will require an additional property like
qca,calibration-in-otp. What's your opinion?

Thanks,
Christian

Re: [PATCH net-next] mlx4: Better use of order-0 pages in RX path

2017-03-13 Thread Alexei Starovoitov

On Mon, Mar 13, 2017 at 04:44:19PM -0700, Eric Dumazet wrote:
> On Mon, Mar 13, 2017 at 4:40 PM, Alexei Starovoitov
>  wrote:
> > On Mon, Mar 13, 2017 at 04:28:04PM -0700, Eric Dumazet wrote:
> >> On Mon, Mar 13, 2017 at 4:21 PM, Alexei Starovoitov
> >>  wrote:
> >> >
> >> > is it once in the beginning only? If so then why that
> >> > 'if (!ring->page_cache.index)' check is done for every packet?
> >>
> >>
> >>
> >> You did not really read the patch, otherwise you would not ask these 
> >> questions.
> >
> > please explain. I see
> > +  if (!ring->page_cache.index) {
> > +  npage = mlx4_alloc_page(priv, ring,
> > which is done for every packet that goes via XDP_TX.
> >
> 
> Well, we do for all packets, even on hosts not running XDP:
> 
> if (xdp_prog) { ...
> 
> ...
> 
> Then :
> 
> if (doorbell_pending))
>  mlx4_en_xmit_doorbell(priv->tx_ring[TX_XDP][cq->ring]);
> 
> And nobody complained of few additional instructions.

it's not the additional 'if' I'm concerned about it's the allocation
after 'if', since you didn't explain clearly when it's executed.

> >> Test it, and if you find a regression, shout loudly.
> >
> > that's not how it works. It's a job of submitter to prove
> > that additional code doesn't cause regressions especially
> > when there are legitimate concerns.
> 
> I have no easy way to test XDP. I  have never used it and am not
> planning to use it any time soon.
> 
> Does it mean I no longer can participate to linux dev ?

when you're touching the most performance sensitive piece of XDP in
the driver then yes, you have to show that XDP doesn't regress.
Especially since it's trivial to test.
Two mlx4 serves, pktgen and xdp2 is enough.

[PATCH] net: mpls: Fix nexthop alive tracking on down events

2017-03-13 Thread David Ahern

Alive tracking of nexthops can account for a link twice if the carrier
goes down followed by an admin down of the same link rendering multipath
routes useless. This is similar to 79099aab38c8 for UNREGISTER events and
DOWN events.

Fix by tracking number of alive nexthops in mpls_ifdown similar to the
logic in mpls_ifup. Checking the flags per nexthop once after all events
have been processed is simpler than trying to maintian a running count
through all event combinations.

Also, WRITE_ONCE is used instead of ACCESS_ONCE to set rt_nhn_alive
per a comment from checkpatch:
WARNING: Prefer WRITE_ONCE(, ) over ACCESS_ONCE() = 

Fixes: c89359a42e2a4 ("mpls: support for dead routes")
Signed-off-by: David Ahern 
---
 net/mpls/af_mpls.c | 13 ++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
index 33211f9a2656..6414079aa729 100644
--- a/net/mpls/af_mpls.c
+++ b/net/mpls/af_mpls.c
@@ -1269,6 +1269,8 @@ static void mpls_ifdown(struct net_device *dev, int event)
 {
struct mpls_route __rcu **platform_label;
struct net *net = dev_net(dev);
+   unsigned int nh_flags = RTNH_F_DEAD | RTNH_F_LINKDOWN;
+   unsigned int alive;
unsigned index;
 
platform_label = rtnl_dereference(net->mpls.platform_label);
@@ -1278,9 +1280,11 @@ static void mpls_ifdown(struct net_device *dev, int 
event)
if (!rt)
continue;
 
+   alive = 0;
change_nexthops(rt) {
if (rtnl_dereference(nh->nh_dev) != dev)
-   continue;
+   goto next;
+
switch (event) {
case NETDEV_DOWN:
case NETDEV_UNREGISTER:
@@ -1288,13 +1292,16 @@ static void mpls_ifdown(struct net_device *dev, int 
event)
/* fall through */
case NETDEV_CHANGE:
nh->nh_flags |= RTNH_F_LINKDOWN;
-   if (event != NETDEV_UNREGISTER)
-   ACCESS_ONCE(rt->rt_nhn_alive) = 
rt->rt_nhn_alive - 1;
break;
}
if (event == NETDEV_UNREGISTER)
RCU_INIT_POINTER(nh->nh_dev, NULL);
+next:
+   if (!(nh->nh_flags & nh_flags))
+   alive++;
} endfor_nexthops(rt);
+
+   WRITE_ONCE(rt->rt_nhn_alive, alive);
}
 }
 
-- 
2.1.4

bond procfs hw addr prints

2017-03-13 Thread Jarod Wilson

I've got a bug report for someone using a Intel OPA devices in a bond, 
and it appears these devices have a hardware address length of 20, 
opposed to the typical 6 on ethernet. When they dump 
/proc/net/bonding/bondX, it only prints the first 6 of the address, per 
%pM and mac_address_string(), while sysfs for the interface does print 
the right thing, since it uses sysfs_print_mac(), which takes a length 
argument.


So the question is... What's the best route to take here? Expand %pM to 
support variable length hardware addresses? Use sysfs_* in procfs? 
Reinvent the wheel? Nothing I've tinkered with just yet feels very 
clean, on top of not actually working yet. :)


--
Jarod Wilson
ja...@redhat.com

Re: [PATCH net-next] mlx4: Better use of order-0 pages in RX path

2017-03-13 Thread Eric Dumazet

On Mon, Mar 13, 2017 at 4:40 PM, Alexei Starovoitov
 wrote:
> On Mon, Mar 13, 2017 at 04:28:04PM -0700, Eric Dumazet wrote:
>> On Mon, Mar 13, 2017 at 4:21 PM, Alexei Starovoitov
>>  wrote:
>> >
>> > is it once in the beginning only? If so then why that
>> > 'if (!ring->page_cache.index)' check is done for every packet?
>>
>>
>>
>> You did not really read the patch, otherwise you would not ask these 
>> questions.
>
> please explain. I see
> +  if (!ring->page_cache.index) {
> +  npage = mlx4_alloc_page(priv, ring,
> which is done for every packet that goes via XDP_TX.
>

Well, we do for all packets, even on hosts not running XDP:

if (xdp_prog) { ...

...

Then :

if (doorbell_pending))
 mlx4_en_xmit_doorbell(priv->tx_ring[TX_XDP][cq->ring]);

And nobody complained of few additional instructions.

Should I had, very loudly ?


>> Test it, and if you find a regression, shout loudly.
>
> that's not how it works. It's a job of submitter to prove
> that additional code doesn't cause regressions especially
> when there are legitimate concerns.

I have no easy way to test XDP. I  have never used it and am not
planning to use it any time soon.

Does it mean I no longer can participate to linux dev ?

Nice to hear Alexei.

Re: [PATCH net-next] mlx4: Better use of order-0 pages in RX path

2017-03-13 Thread Alexei Starovoitov

On Mon, Mar 13, 2017 at 04:28:04PM -0700, Eric Dumazet wrote:
> On Mon, Mar 13, 2017 at 4:21 PM, Alexei Starovoitov
>  wrote:
> >
> > is it once in the beginning only? If so then why that
> > 'if (!ring->page_cache.index)' check is done for every packet?
> 
> 
> 
> You did not really read the patch, otherwise you would not ask these 
> questions.

please explain. I see
+  if (!ring->page_cache.index) {
+  npage = mlx4_alloc_page(priv, ring,
which is done for every packet that goes via XDP_TX.

> Test it, and if you find a regression, shout loudly.

that's not how it works. It's a job of submitter to prove
that additional code doesn't cause regressions especially
when there are legitimate concerns.

Re: [net-next sample action optimization 3/3] openvswitch: Optimize sample action for the clone use cases

2017-03-13 Thread Andy Zhou

>>> -   skb = skb_clone(skb, GFP_ATOMIC);
>>> -   if (!skb)
>>> -   /* Skip the sample action when out of memory. */
>>> -   return 0;
>>> +   if (key) {
>>> +   err = do_execute_actions(dp, skb, key, actions, rem);
>>> +   } else if (!add_deferred_actions(skb, orig, actions, rem)) {
>>>
>> We can refactor this code to avoid duplicate code all actions
>> implementation. This way there could be single function dealing with
>> both defered action and key fifo arrays.
>
> O.K. I will make the change in the next version.

After looking more at it, the sample action and recirc action are different
enough that I don't see clean ways refactor them. For example, recirc
needs to deal with setting recirc_id.  recirc calls
ovs_dp_process_packet, and sample calls do_execute_action.

>> I am not sure if we can put sample or recirc in "may not change flow"
>> actions list. Consider following set of actions:
>> sample(sample(set-IP)),userpsace(),...
>> In this case the userspace action could recieve skb with inconsistent
>> flow due to preceding of set action nested in the sample action.
>
> Good catch.  Will fix.

What's the objection on recirc action? It always works on clone key.

Re: [RFC PATCH] sock: add SO_RCVQUEUE_SIZE getsockopt

2017-03-13 Thread Josh Hunt

On 03/13/2017 02:39 PM, David Miller wrote:

From: Josh Hunt 
Date: Mon, 13 Mar 2017 12:38:39 -0500

On 03/13/2017 11:12 AM, Eric Dumazet wrote:

On Mon, Mar 13, 2017 at 8:59 AM, Josh Hunt  wrote:

Allows application to read the amount of data sitting in the receive
queue.

Signed-off-by: Josh Hunt 
---

A team here is looking for a way to get the amount of data in a UDP
socket's
receive queue. It seems like this should be SIOCINQ, but for UDP
sockets that
returns the size of the next pending datagram. I implemented the patch
below,
but am wondering if this is the right place for this change? I was
debating
between this or a new UDP ioctl.

But what is the 'amount of data' exactly ?
Number of packets, amount of bytes to read from these packets ?

I meant bytes. I will clarify in the next version.

As Eric is hinting, the calculation you are using doesn't represent
this.

You need to do something like walk the receive queue and add the
skb->len values together.

sk->sk_rmem_alloc is usually much larger than the sum of the skb->len
values in the socket receive queue.  I don't see how this culmination
of skb->truesize values is useful, whereas I can see how an application
could want the summation of the skb->len values.

In this particular case they really do want to know total # of bytes in 
the receive queue, not the data bytes they can consume from an 
application pov. The kernel currently only exposes this value through 
netlink or /proc/net/udp from what I saw.

I believe Eric's suggestion in his previous mail was to export all of 
these meminfo metrics via a single socket option call similar to how its 
done in netlink. We could then use that for both call sites.

I agree that it would be useful to also have the data you and Eric are 
suggesting exposed somewhere, the total # of skb->len bytes sitting in 
the receive queue. I could add that as a second socket option.

Does this sound reasonable?

Josh

Re: [PATCH net-next] mlx4: Better use of order-0 pages in RX path

2017-03-13 Thread Eric Dumazet

On Mon, Mar 13, 2017 at 4:21 PM, Alexei Starovoitov
 wrote:
>
> is it once in the beginning only? If so then why that
> 'if (!ring->page_cache.index)' check is done for every packet?

You did not really read the patch, otherwise you would not ask these questions.

Test it, and if you find a regression, shout loudly.

Re: [PATCH net-next] mlx4: Better use of order-0 pages in RX path

2017-03-13 Thread Alexei Starovoitov

On Mon, Mar 13, 2017 at 02:09:23PM -0700, Eric Dumazet wrote:
> On Mon, Mar 13, 2017 at 1:23 PM, Alexei Starovoitov
>  wrote:
> > On Mon, Mar 13, 2017 at 11:58:05AM -0700, Eric Dumazet wrote:
> >> On Mon, Mar 13, 2017 at 11:31 AM, Alexei Starovoitov
> >>  wrote:
> >> > On Mon, Mar 13, 2017 at 10:50:28AM -0700, Eric Dumazet wrote:
> >> >> On Mon, Mar 13, 2017 at 10:34 AM, Alexei Starovoitov
> >> >>  wrote:
> >> >> > On Sun, Mar 12, 2017 at 05:58:47PM -0700, Eric Dumazet wrote:
> >> >> >> @@ -767,10 +814,30 @@ int mlx4_en_process_rx_cq(struct net_device 
> >> >> >> *dev, struct mlx4_en_cq *cq, int bud
> >> >> >>   case XDP_PASS:
> >> >> >>   break;
> >> >> >>   case XDP_TX:
> >> >> >> + /* Make sure we have one page ready to 
> >> >> >> replace this one */
> >> >> >> + npage = NULL;
> >> >> >> + if (!ring->page_cache.index) {
> >> >> >> + npage = mlx4_alloc_page(priv, 
> >> >> >> ring,
> >> >> >> + , 
> >> >> >> numa_mem_id(),
> >> >> >> + 
> >> >> >> GFP_ATOMIC | __GFP_MEMALLOC);
> >> >> >
> >> >> > did you test this with xdp2 test ?
> >> >> > under what conditions it allocates ?
> >> >> > It looks dangerous from security point of view to do allocations here.
> >> >> > Can it be exploited by an attacker?
> >> >> > we use xdp for ddos and lb and this is fast path.
> >> >> > If 1 out of 100s XDP_TX packets hit this allocation we will have 
> >> >> > serious
> >> >> > perf regression.
> >> >> > In general I dont think it's a good idea to penalize x86 in favor of 
> >> >> > powerpc.
> >> >> > Can you #ifdef this new code somehow? so we won't have these concerns 
> >> >> > on x86?
> >> >>
> >> >> Normal paths would never hit this point really. I wanted to be extra
> >> >> safe, because who knows, some guys could be tempted to set
> >> >> ethtool -G ethX  rx 512 tx 8192
> >> >>
> >> >> Before this patch, if you were able to push enough frames in TX ring,
> >> >> you would also eventually be forced to allocate memory, or drop 
> >> >> frames...
> >> >
> >> > hmm. not following.
> >> > Into xdp tx queues packets don't come from stack. It can only be via 
> >> > xdp_tx.
> >> > So this rx page belongs to driver, not shared with anyone and it only 
> >> > needs to
> >> > be put onto tx ring, so I don't understand why driver needs to allocating
> >> > anything here. To refill the rx ring? but why here?
> >>
> >> Because RX ring can be shared, by packets goind to the upper stacks (say 
> >> TCP)
> >>
> >> So there is no guarantee that the pages in the quarantine pool have
> >> their page count to 1.
> >>
> >> The normal TX_XDP path will recycle pages in ring->cache_page .
> >>
> >> This is exactly where I pick up a replacement.
> >>
> >> Pages in ring->cache_page have the guarantee to have no other users
> >> than ourself (mlx4 driver)
> >>
> >> You might have not noticed that current mlx4 driver has a lazy refill
> >> of RX ring buffers, that eventually
> >> removes all the pages from RX ring, and we have to recover with this
> >> lazy mlx4_en_recover_from_oom() thing
> >> that will attempt to restart the allocations.
> >>
> >> After my patch, we have the guarantee that the RX ring buffer is
> >> always fully populated.
> >>
> >> When we receive a frame (XDP or not), we drop it if we can not
> >> replenish the RX slot,
> >> in case the oldest page in quarantine is not a recycling candidate and
> >> we can not allocate a new page.
> >
> > Got it. Could you please add above explanation to the commit log,
> > since it's a huge difference vs other drivers.
> > I don't think any other driver in the tree follows this strategy.
> 
> sk_buff allocation can fail before considering adding a frag on the
> skb anyway...
> 
> Look, all this is completely irrelevant for XDP_TX, since there is no
> allocation at all,
> once ~128 pages have been put into page_cache.

is it once in the beginning only? If so then why that
'if (!ring->page_cache.index)' check is done for every packet?
If every 128 packets then it will cause performance drop.
Since I don't completely understand how your new recycling
logic works, I'm asking you to test this patch with samples/bpf/xdp2
to make sure perf is still good, since it doesn't sound that
you tested with xdp and I don't understand the patch enough to see
the impact and it makes me worried.

> Do you want to prefill this cache when XDP is loaded ?
> 
> > I think that's the right call, but it shouldn't be hidden in details.
> 
> ring->page_cache contains the pages used by XDP_TX are recycled
> through mlx4_en_rx_recycle()
> 
> There is a nice comment there. I did not change this part.
> 
> These pages _used_

Re: [PATCH v2 net-next] qed*: Utilize Firmware 8.15.3.0

2017-03-13 Thread Christoph Hellwig

On Mon, Mar 13, 2017 at 03:33:47PM -0700, David Miller wrote:
> Applied, thanks.

So everyone who doesn't have the very latests linux-firmware will now
have a non-working card after upgrading the kernel?

Re: [PATCH net-next 4/4] net-next: dsa: add dsa support for Mediatek MT7530 switch

2017-03-13 Thread Andrew Lunn

> +static int
> +mt7530_setup(struct dsa_switch *ds)
> +{
> + struct mt7530_priv *priv = ds->priv;
> + int ret, i, phy_mode;
> + u8  cpup_mask = 0;
> + u32 id, val;
> + struct regmap *regmap;
> +
> + /* Make sure that cpu port specfied on the dt is appropriate */
> + if (!dsa_is_cpu_port(ds, MT7530_CPU_PORT)) {
> + dev_err(priv->dev, "port not matched with the CPU port\n");
> + return -EINVAL;
> + }
> +
> + regmap = devm_regmap_init(ds->dev, NULL, priv,
> +   _regmap_config);
> + if (IS_ERR(regmap))
> + dev_warn(priv->dev, "phy regmap initialization failed");
> +
> + phy_mode = of_get_phy_mode(ds->ports[ds->dst->cpu_port].dn);
> + if (phy_mode < 0) {
> + dev_err(priv->dev, "Can't find phy-mode for master device\n");
> + return phy_mode;
> + }
> + dev_info(priv->dev, "phy-mode for master device = %x\n", phy_mode);

Hi Sean

It is not documented in the binding that a phy-mode is mandatory for
the cpu port.

Andrew

[PATCH] net: ethernet: aquantia: set net_device mtu when mtu is changed

2017-03-13 Thread David Arcari

When the aquantia device mtu is changed the net_device structure is not
updated.  As a result the ip command does not properly reflect the mtu change.

Commit 5513e16421cb incorrectly assumed that __dev_set_mtu() was making the
assignment ndev->mtu = new_mtu;  This is not true in the case where the driver
has a ndo_change_mtu routine.

Fixes: 5513e16421cb ("net: ethernet: aquantia: Fixes for aq_ndev_change_mtu")

Cc: Pavel Belous 
Signed-off-by: David Arcari 
---
 drivers/net/ethernet/aquantia/atlantic/aq_main.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/aquantia/atlantic/aq_main.c 
b/drivers/net/ethernet/aquantia/atlantic/aq_main.c
index dad6362..d05fbfd 100644
--- a/drivers/net/ethernet/aquantia/atlantic/aq_main.c
+++ b/drivers/net/ethernet/aquantia/atlantic/aq_main.c
@@ -98,6 +98,7 @@ static int aq_ndev_change_mtu(struct net_device *ndev, int 
new_mtu)
 
if (err < 0)
goto err_exit;
+   ndev->mtu = new_mtu;
 
if (netif_running(ndev)) {
aq_ndev_close(ndev);
-- 
1.8.3.1

Re: [PATCH net] vxlan: fix ovs support

2017-03-13 Thread David Miller

From: Nicolas Dichtel 
Date: Mon, 13 Mar 2017 16:24:03 +0100

> The required changes in the function vxlan_dev_create() were missing
> in commit 8bcdc4f3a20b.
> The vxlan device is not registered anymore after this patch and the error
> path causes an stack dump:
>  WARNING: CPU: 3 PID: 1498 at net/core/dev.c:6713 
> rollback_registered_many+0x9d/0x3f0
> 
> Fixes: 8bcdc4f3a20b ("vxlan: add changelink support")
> CC: Roopa Prabhu 
> Signed-off-by: Nicolas Dichtel 

Applied, thank you.

Re: [PATCH] net: use net->count to check whether a netns is alive or not

2017-03-13 Thread David Miller

From: Andrei Vagin 
Date: Sun, 12 Mar 2017 21:36:18 -0700

> The previous idea was to check whether a net namespace is in
> net_exit_list or not. It doesn't work, because net->exit_list is used in
> __register_pernet_operations and __unregister_pernet_operations where
> all namespaces are added to a temporary list to make cleanup in a error
> case, so list_empty(>exit_list) always returns false.
> 
> Reported-by: Mantas Mikulėnas 
> Fixes: 002d8a1a6c11 ("net: skip genenerating uevents for network namespaces 
> that are exiting")
> Signed-off-by: Andrei Vagin 

Applied and queued up for -stable, thanks.

Re: [PATCH] net: ethernet: aquantia: set net_device mtu when mtu is changed

2017-03-13 Thread David Arcari

On 03/13/2017 03:56 PM, David Miller wrote:
> From: David Arcari 
> Date: Mon, 13 Mar 2017 11:50:50 -0400
> 
>> On 03/13/2017 02:09 AM, David Miller wrote:
>>> From: David Arcari 
>>> Date: Wed,  8 Mar 2017 16:33:21 -0500
>>>
 When the aquantia device mtu is changed the net_device structure is not
 updated.  As a result the ip command does not properly reflect the mtu 
 change.

 Commit 5513e16421cb incorrectly assumed that __dev_set_mtu() was making the
 assignment ndev->mtu = new_mtu;  This is not true in the case where the 
 driver
 has a ndo_change_mtu routine.

 Fixes: 5513e16421cb ("net: ethernet: aquantia: Fixes for 
 aq_ndev_change_mtu")

 Cc: Pavel Belous 
 Signed-off-by: David Arcari 
>>>
>>> Applied, thanks.
>>>
>>
>> Hi David,
>>
>> I see that my patch:
>>
>> "net: ethernet: aquantia: call set_irq_affinity_hint before free_irq"
>>
>> has been applied to net, but I don't see that this patch has been applied.
> 
> It is marked as "changes requested" in patchwork, because you were asked to
> do the restart label removal in a separate patch.
> 


Sorry, I was not clear.  I was trying to ask if your "Applied, thanks" reply
(above) was meant for this email thread.  I believe it was meant for the
set_irq_affinity_hint patch.

Thanks,

-DA

Re: [PATCH net-next] net: dsa: mv88e6xxx: debug ATU Age Time

2017-03-13 Thread Andrew Lunn

On Mon, Mar 13, 2017 at 03:42:36PM -0700, Florian Fainelli wrote:
> On 03/13/2017 03:39 PM, Andrew Lunn wrote:
> > On Mon, Mar 13, 2017 at 03:20:43PM -0400, Vivien Didelot wrote:
> >> The ATU ageing time value programmed in the switch is rounded up to the
> >> nearest multiple of its coefficient (variable depending on the model.)
> >>
> >> Add a debug message to inform the user about the exact programmed value.
> >>
> >> On 6352, "brctl setageing br0 18" gives "AgeTime set to 0x01 (15000 ms)"
> >> while on 6390 we get "AgeTime set to 0x05 (18750 ms)".
> >>
> >> Signed-off-by: Vivien Didelot 
> >> ---
> >>  drivers/net/dsa/mv88e6xxx/global1_atu.c | 9 -
> >>  1 file changed, 8 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/drivers/net/dsa/mv88e6xxx/global1_atu.c 
> >> b/drivers/net/dsa/mv88e6xxx/global1_atu.c
> >> index f6cd3c939da4..bac34737b096 100644
> >> --- a/drivers/net/dsa/mv88e6xxx/global1_atu.c
> >> +++ b/drivers/net/dsa/mv88e6xxx/global1_atu.c
> >> @@ -65,7 +65,14 @@ int mv88e6xxx_g1_atu_set_age_time(struct mv88e6xxx_chip 
> >> *chip,
> >>val &= ~0xff0;
> >>val |= age_time << 4;
> >>  
> >> -  return mv88e6xxx_g1_write(chip, GLOBAL_ATU_CONTROL, val);
> >> +  err = mv88e6xxx_g1_write(chip, GLOBAL_ATU_CONTROL, val);
> >> +  if (err)
> >> +  return err;
> >> +
> >> +  dev_dbg(chip->dev, "AgeTime set to 0x%02x (%d ms)\n", age_time,
> >> +  age_time * coeff);
> >> +
> > 
> > Hi Vivien
> > 
> > You could put the dev_dbg before the mv88e6xxx_g1_write(), to keep the
> > code simpler. If this write fails, we expect a lot of other things to
> > go horribly wrong, so having one debug message being not quite accurate
> > is not important.
> 
> The debug message would not be printed in case mv88e6xxx_g1_write()
> fails, also, having the message printed after the write occurred is a
> good way to make sure the write did make it through. Did I miss
> something in what you are suggesting here?

We never, ever see a read or a write failure on the MDIO bus. If it
ever does, i expect the switch is dead, gone, never to be heard from
again until the power is reset. We are going to have lots of
failures. So it seems simpler to have:

dev_dbg(chip->dev, "Setting AgeTime to 0x%02x (%d ms)\n", age_time,
age_time * coeff);

return mv88e6xxx_g1_write(chip, GLOBAL_ATU_CONTROL, val);

and accept that if for some unlikely reason the write does fail, the
debug message is probably not accurate.

  Andrew

Re: [PATCH net-next] net: dsa: mv88e6xxx: set out of range ageing time

2017-03-13 Thread Andrew Lunn

On Mon, Mar 13, 2017 at 03:19:32PM -0400, Vivien Didelot wrote:
> The minimum and maximum value of the ATU Age Time varies depending on
> the switch model. The current code returns -ERANGE for out-of-range
> values, and makes switchdev commit phase fail with this stacktrace:

Hi Vivien

I took a look at other switch drivers. mlxsw return ERANGE in the
prepare phase. rocker is not limited, since it is using software
timers.

It seems like the correct way to do this is via a prepare call, or add
min/max fields to struct dsa_switch and let slave.c perform the check.

   Andrew

Re: [PATCH net-next] net: dsa: mv88e6xxx: debug ATU Age Time

2017-03-13 Thread Florian Fainelli

On 03/13/2017 03:39 PM, Andrew Lunn wrote:
> On Mon, Mar 13, 2017 at 03:20:43PM -0400, Vivien Didelot wrote:
>> The ATU ageing time value programmed in the switch is rounded up to the
>> nearest multiple of its coefficient (variable depending on the model.)
>>
>> Add a debug message to inform the user about the exact programmed value.
>>
>> On 6352, "brctl setageing br0 18" gives "AgeTime set to 0x01 (15000 ms)"
>> while on 6390 we get "AgeTime set to 0x05 (18750 ms)".
>>
>> Signed-off-by: Vivien Didelot 
>> ---
>>  drivers/net/dsa/mv88e6xxx/global1_atu.c | 9 -
>>  1 file changed, 8 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/net/dsa/mv88e6xxx/global1_atu.c 
>> b/drivers/net/dsa/mv88e6xxx/global1_atu.c
>> index f6cd3c939da4..bac34737b096 100644
>> --- a/drivers/net/dsa/mv88e6xxx/global1_atu.c
>> +++ b/drivers/net/dsa/mv88e6xxx/global1_atu.c
>> @@ -65,7 +65,14 @@ int mv88e6xxx_g1_atu_set_age_time(struct mv88e6xxx_chip 
>> *chip,
>>  val &= ~0xff0;
>>  val |= age_time << 4;
>>  
>> -return mv88e6xxx_g1_write(chip, GLOBAL_ATU_CONTROL, val);
>> +err = mv88e6xxx_g1_write(chip, GLOBAL_ATU_CONTROL, val);
>> +if (err)
>> +return err;
>> +
>> +dev_dbg(chip->dev, "AgeTime set to 0x%02x (%d ms)\n", age_time,
>> +age_time * coeff);
>> +
> 
> Hi Vivien
> 
> You could put the dev_dbg before the mv88e6xxx_g1_write(), to keep the
> code simpler. If this write fails, we expect a lot of other things to
> go horribly wrong, so having one debug message being not quite accurate
> is not important.

The debug message would not be printed in case mv88e6xxx_g1_write()
fails, also, having the message printed after the write occurred is a
good way to make sure the write did make it through. Did I miss
something in what you are suggesting here?
-- 
Florian

Re: [PATCH v7 0/6] Bluetooth: 6LoWPAN: Fix lladdr length

2017-03-13 Thread David Miller

From: Luiz Augusto von Dentz 
Date: Sun, 12 Mar 2017 10:19:32 +0200

> From: Luiz Augusto von Dentz 
> 
> These patches fixes lladdr length to be 6 bytes long and not 8 which cause
> neighbor advertisement to be sent with wrong lladdr including FF:FE filler
> bytes for eui64.
> 
> Note: This does not fix some of the existing crashes which I hope to address
> in a different set.
> 
> v2: Make all code paths that generate a link-local from lladdr use the same
> code.
> v3: Use lowpan_iphc_uncompress_eui48_lladdr to generate the remote ip address.
> v4: Handle comments from Stefan Schmidt.
> v5: Add patch to fix IID format for Bluetooth
> v6: Fix addrconf_ifid_eui48 to follow IID format for Bluetooth
> v7: Rework addrconf_ifid_6lowpan so it doesn't use addrconf_ifid_eui48

Since this is predominantly bluetooth/6lowpan stuff, please merge this via
the bluetooth tree.

Thanks!

Re: [PATCH net-next] net: dsa: mv88e6xxx: debug ATU Age Time

2017-03-13 Thread Andrew Lunn

On Mon, Mar 13, 2017 at 03:20:43PM -0400, Vivien Didelot wrote:
> The ATU ageing time value programmed in the switch is rounded up to the
> nearest multiple of its coefficient (variable depending on the model.)
> 
> Add a debug message to inform the user about the exact programmed value.
> 
> On 6352, "brctl setageing br0 18" gives "AgeTime set to 0x01 (15000 ms)"
> while on 6390 we get "AgeTime set to 0x05 (18750 ms)".
> 
> Signed-off-by: Vivien Didelot 
> ---
>  drivers/net/dsa/mv88e6xxx/global1_atu.c | 9 -
>  1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/dsa/mv88e6xxx/global1_atu.c 
> b/drivers/net/dsa/mv88e6xxx/global1_atu.c
> index f6cd3c939da4..bac34737b096 100644
> --- a/drivers/net/dsa/mv88e6xxx/global1_atu.c
> +++ b/drivers/net/dsa/mv88e6xxx/global1_atu.c
> @@ -65,7 +65,14 @@ int mv88e6xxx_g1_atu_set_age_time(struct mv88e6xxx_chip 
> *chip,
>   val &= ~0xff0;
>   val |= age_time << 4;
>  
> - return mv88e6xxx_g1_write(chip, GLOBAL_ATU_CONTROL, val);
> + err = mv88e6xxx_g1_write(chip, GLOBAL_ATU_CONTROL, val);
> + if (err)
> + return err;
> +
> + dev_dbg(chip->dev, "AgeTime set to 0x%02x (%d ms)\n", age_time,
> + age_time * coeff);
> +

Hi Vivien

You could put the dev_dbg before the mv88e6xxx_g1_write(), to keep the
code simpler. If this write fails, we expect a lot of other things to
go horribly wrong, so having one debug message being not quite accurate
is not important.

   Andrew

Re: [PATCH] mpls: Do not decrement alive counter for unregister events

2017-03-13 Thread David Ahern

On 3/13/17 3:11 PM, David Ahern wrote:
> On 3/13/17 5:10 AM, Robert Shearman wrote:
>> Doesn't this leave the problem that if the device's link goes down and
>> then the device gets deleted the alive count will be decremented twice
>> for the same path?
> yes. and it exposes another bug in multipath selection.
> 

nevermind. I did not set the sysctl to keep ipv6 addresses; link down on
the veth device took out the address and route

Re: [Patch net-next] atm: remove an unnecessary loop

2017-03-13 Thread David Miller

From: Chas Williams <3ch...@gmail.com>
Date: Sat, 11 Mar 2017 19:41:36 -0500

> From: Francois Romieu 
> 
> Andrey reported this kernel warning:
> 
> WARNING: CPU: 0 PID: 4114 at kernel/sched/core.c:7737 
> __might_sleep+0x149/0x1a0
> do not call blocking ops when !TASK_RUNNING; state=1 set at
> [] prepare_to_wait+0x182/0x530
> 
> The deeply nested alloc_skb is a problem.
> 
> Diagnosis: nesting is wrong. It makes zero sense. Fix it and the
> implicit task state change problem automagically goes away.
> 
> alloc_skb() does not need to be in the "while" loop.
> 
> alloc_skb() does not need to be in the {prepare_to_wait/add_wait_queue ...
> finish_wait/remove_wait_queue} block.
> 
> I claim that:
> - alloc_tx() should only perform the "wait_for_decent_tx_drain" part
> - alloc_skb() ought to be done directly in vcc_sendmsg
> - alloc_skb() failure can be handled gracefully in vcc_sendmsg
> - alloc_skb() may use a (m->msg_flags & MSG_DONTWAIT) dependent
>   GFP_{KERNEL / ATOMIC} flag
> 
> Reported-by: Andrey Konovalov 
> Reviewed-and-Tested-by: Chas Williams <3ch...@gmail.com>
> Signed-off-by: Chas Williams <3ch...@gmail.com>

Applied, thanks a lot Chas.

Re: [PATCH v2 net-next] qed*: Utilize Firmware 8.15.3.0

2017-03-13 Thread David Miller

From: Yuval Mintz 
Date: Sat, 11 Mar 2017 18:39:18 +0200

> This patch advances the qed* drivers into using the newer firmware -
> This solves several firmware bugs, mostly related [but not limited to]
> various init/deinit issues in various offloaded protocols.
> 
> It also introduces a major 4-Cached SGE change in firmware, which can be
> seen in the storage drivers' changes.
> 
> In addition, this firmware is required for supporting the new QL41xxx
> series of adapters; While this patch doesn't add the actual support,
> the firmware contains the necessary initialization & firmware logic to
> operate such adapters [actual support would be added later on].
> 
> Changes from Previous versions:
> ---
>  - V2 - fix kbuild-test robot warnings
> 
> Signed-off-by: Tomer Tayar 
> Signed-off-by: Ram Amrani 
> Signed-off-by: Manish Rangankar 
> Signed-off-by: Chad Dupuis 
> Signed-off-by: Yuval Mintz 

Applied, thanks.

Re: [PATCH] usbnet: smsc95xx: Reduce logging noise

2017-03-13 Thread David Miller

From: Guenter Roeck 
Date: Fri, 10 Mar 2017 17:45:21 -0800

> An insert/remove stress test generated the following log message sequence.
...
> Use netdev_dbg() instead of netdev_warn() for the repeating messages
> to reduce logging noise.
> 
> Signed-off-by: Guenter Roeck 

The problem I have with changes like this is that outside of your
stress test situation these messages are extremely useful but will
now be hidden making diagnosis of problems more difficult.

Perhaps you can check the error code or some piece of USB device
state to see if there was a disconnect, and elide the log message
in that case specifically.

Thanks.

Re: [PATCH net-next v3 0/2] mpls: allow TTL propagation to/from IP packets to be configured

2017-03-13 Thread David Miller

From: Robert Shearman 
Date: Fri, 10 Mar 2017 20:43:23 +

> It is sometimes desirable to present an MPLS transport network as a
> single hop to traffic transiting it because it prevents confusion when
> diagnosing failures. An example of where confusion can be generated is
> when addresses used in the provider network overlap with addresses in
> the overlay network and the addresses get exposed through ICMP errors
> generated as packets transit the provider network.
> 
> In addition, RFC 3443 defines two methods of deriving TTL for an
> outgoing packet: Uniform Model where the TTL is propagated to/from the
> MPLS header and both Pipe Models and Short Pipe Models (with and
> without PHP) where the TTL is not propagated to/from the MPLS header.
> 
> Changes in v3:
>  - decrement ttl on popping last label when not doing ttl propagation,
>as suggested by David Ahern.
>  - add comment to describe what the somewhat complex conditionals are
>doing to work out what ttl to use in mpls_iptunnel.c.
>  - rearrange fields fields in struct netns_mpls to keep the platform
>label fields together, as suggested by David Ahern.
> 
> Changes in v2:
>  - add references to RFC 3443 as suggested by David Ahern
>  - fix setting of skb->protocol as noticed by David Ahern
>  - implement per-route/per-LWT configurability as suggested by Eric
>Biederman
>  - split into two patches for ease of review

Series applied, thanks.

Re: [PATCH] net: usb: asix88179_178a: use new api ethtool_{get|set}_link_ksettings

2017-03-13 Thread David Miller

From: Philippe Reynes 
Date: Sun, 12 Mar 2017 18:02:36 +0100

> The ethtool api {get|set}_settings is deprecated.
> We move this driver to new api {get|set}_link_ksettings.
> 
> As I don't have the hardware, I'd be very pleased if
> someone may test this patch.
> 
> Signed-off-by: Philippe Reynes 

Applied.

Re: [PATCH] net: usb: catc: use new api ethtool_{get|set}_link_ksettings

2017-03-13 Thread David Miller

From: Philippe Reynes 
Date: Sun, 12 Mar 2017 22:08:26 +0100

> The ethtool api {get|set}_settings is deprecated.
> We move this driver to new api {get|set}_link_ksettings.
> 
> As I don't have the hardware, I'd be very pleased if
> someone may test this patch.
> 
> Signed-off-by: Philippe Reynes 

Applied.

Re: [PATCH] net: usb: r8152: use new api ethtool_{get|set}_link_ksettings

2017-03-13 Thread David Miller

From: Philippe Reynes 
Date: Sun, 12 Mar 2017 22:41:58 +0100

> The ethtool api {get|set}_settings is deprecated.
> We move this driver to new api {get|set}_link_ksettings.
> 
> As I don't have the hardware, I'd be very pleased if
> someone may test this patch.
> 
> Signed-off-by: Philippe Reynes 

Applied.

Re: [PATCH] net: net_netdev: use new api ethtool_{get|set}_link_ksettings

2017-03-13 Thread David Miller

From: Philippe Reynes 
Date: Thu,  9 Mar 2017 23:10:13 +0100

> The ethtool api {get|set}_settings is deprecated.
> We move this driver to new api {get|set}_link_ksettings.
> 
> As I don't have the hardware, I'd be very pleased if
> someone may test this patch.
> 
> Signed-off-by: Philippe Reynes 

Applied.

Re: [PATCH v2] net: tun: use new api ethtool_{get|set}_link_ksettings

2017-03-13 Thread David Miller

From: Philippe Reynes 
Date: Sat, 11 Mar 2017 22:03:50 +0100

> The ethtool api {get|set}_settings is deprecated.
> We move this driver to new api {get|set}_link_ksettings.
> 
> Signed-off-by: Philippe Reynes 
> ---
> Changelog:
> v2:
> - Finaly, I've found the hardware and do basic test ;)
>   thanks Michael S. Tsirkin and Eric Dumazet for the feedback

:-)  Applied.

Re: [PATCH net-next 1/1 v2] net: rmnet_data: Initial implementation

2017-03-13 Thread kbuild test robot

Hi Subash,

[auto build test WARNING on net-next/master]

url:
https://github.com/0day-ci/linux/commits/Subash-Abhinov-Kasiviswanathan/net-rmnet_data-Initial-implementation/20170313-174754
reproduce:
# apt-get install sparse
make ARCH=x86_64 allmodconfig
make C=1 CF=-D__CHECK_ENDIAN__


sparse warnings: (new ones prefixed by >>)

   include/linux/compiler.h:264:8: sparse: attribute 'no_sanitize_address': 
unknown attribute
>> include/uapi/linux/rmnet_data.h:46:25: sparse: invalid bitfield specifier 
>> for type restricted __le16.
   include/uapi/linux/rmnet_data.h:47:19: sparse: invalid bitfield specifier 
for type restricted __le16.

vim +46 include/uapi/linux/rmnet_data.h

30  #define RMNET_EGRESS_FORMAT_AGGREGATION (1<<2)
31  #define RMNET_EGRESS_FORMAT_MUXING  (1<<3)
32  #define RMNET_EGRESS_FORMAT_MAP_CKSUMV3 (1<<4)
33  #define RMNET_EGRESS_FORMAT_MAP_CKSUMV4 (1<<5)
34  
35  #define RMNET_INGRESS_FIX_ETHERNET  (1<<0)
36  #define RMNET_INGRESS_FORMAT_MAP(1<<1)
37  #define RMNET_INGRESS_FORMAT_DEAGGREGATION  (1<<2)
38  #define RMNET_INGRESS_FORMAT_DEMUXING   (1<<3)
39  #define RMNET_INGRESS_FORMAT_MAP_COMMANDS   (1<<4)
40  #define RMNET_INGRESS_FORMAT_MAP_CKSUMV3(1<<5)
41  #define RMNET_INGRESS_FORMAT_MAP_CKSUMV4(1<<6)
42  
43  struct rmnet_nl_msg_s {
44  __le16 reserved;
45  __le16 message_type;
  > 46  __le16 reserved2:14;
47  __le16 crd:2;
48  union {
49  __le16 arg_length;
50  __le16 return_code;
51  };
52  union {
53  __u8 data[RMNET_NL_DATA_MAX_LEN];
54  struct {

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation

Re: [PATCH 5/7] ath9k: of: Use the clk API to get the reference clock rate

2017-03-13 Thread Rafał Miłecki


On 03/13/2017 10:05 PM, Alban wrote:

@@ -573,6 +575,12 @@ static int ath9k_of_init(struct ath_softc *sc)

ath_dbg(common, CONFIG, "parsing configuration from OF node\n");

+   clk = clk_get(sc->dev, "ref");
+   if (!IS_ERR(clk)) {
+   ah->is_clk_25mhz = (clk_get_rate(clk) == 2500);


One trivial thing: you don't need these extra braces.



+   clk_put(clk);
+   }

Re: [PATCH 3/7] ath9k: Add support for reading the EEPROM data using the nvmem API

2017-03-13 Thread Rafał Miłecki


On 03/13/2017 10:05 PM, Alban wrote:

@@ -654,6 +656,25 @@ static int ath9k_init_softc(u16 devid, struct ath_softc 
*sc,
if (ret)
return ret;

+   /* If the EEPROM hasn't been retrieved via firmware request
+* use the nvmem API insted.
+*/
+   if (!ah->eeprom_blob) {
+   struct nvmem_cell *eeprom_cell;
+
+   eeprom_cell = nvmem_cell_get(ah->dev, "eeprom");
+   if (!IS_ERR(eeprom_cell)) {
+   ah->eeprom_data = nvmem_cell_read(
+   eeprom_cell, >eeprom_size);
+   nvmem_cell_put(eeprom_cell);
+
+   if (IS_ERR(ah->eeprom_data)) {
+   dev_err(ah->dev, "failed to read eeprom");


One trivial thing: missing line break.



+   return PTR_ERR(ah->eeprom_data);
+   }
+   }
+   }
+
if (ath9k_led_active_high != -1)
ah->config.led_active_high = ath9k_led_active_high == 1;

Re: [PATCH net-next RFC v1 00/27] afnetns: new namespace type for separation on protocol level

2017-03-13 Thread Eric W. Biederman

Michael Kerrisk  writes:

> On Mon, Mar 13, 2017 at 12:44 AM, Hannes Frederic Sowa
>  wrote:
>> Hi,
>>
>> On Sun, 2017-03-12 at 16:26 -0700, David Miller wrote:
>>> From: Hannes Frederic Sowa 
>>> Date: Mon, 13 Mar 2017 00:01:24 +0100
>>>
>>> > afnetns behaves like ordinary namespaces: clone, unshare, setns syscalls
>>> > can work with afnetns with one limitation: one cannot cross the realm
>>> > of a network namespace while changing the afnetns compartement. To get
>>> > into a new afnetns in a different net namespace, one must first change
>>> > to the net namespace and afterwards switch to the desired afnetns.
>>>
>>> Please explain why this is useful, who wants this kind of facility,
>>> and how it will be used.
>>
>> Yes, I have to enhance the cover letter:
>>
>> The work behind all this is to provide more dense container hosting.
>> Right now we lose performance, because all packets need to be forwarded
>> through either a bridge or must be routed until they reach the
>> containers. For example, we can't make use of early demuxing for the
>> incoming packets. We basically pass the networking stack twice for
>> every packet.
>>
>> The usage is very much in line with how network namespaces are used
>> nowadays:
>>
>> ip afnetns add afns-1
>> ip address add 192.168.1.1/24 dev eth0 afnetns afns-1
>> ip afnetns exec afns-1 /usr/sbin/httpd
>>
>> this spawns a shell where all child processes will only have access to
>> the specific ip addresses, even though they do a wildcard bind. Source
>> address selection will also use only the ip addresses available to the
>> children.
>>
>> In some sense it has lots of characteristics like ipvlan, allowing a
>> single MAC address to host lots of IP addresses which will end up in
>> different namespaces. Unlink ipvlan however, it will also solve the
>> problem around duplicate address detection and multiplexing packets to
>> the IGMP or MLD state machines.
>>
>> The resource consumption in comparison with ordinary namespaces will be
>> much lower. All in all, we will have far less networking subsystems to
>> cross compared to normal netns solutions.
>>
>> Some more information also in the first patch, which adds a
>> Documentation.

If the goal is one ip address per network namespace with a network
device and mac address on the network I have something that I was
working on that I believe is in the end is a much simpler solution.

Add routes in the routing table between network namespaces.

AKA in the initial network namespace with the network device have
an input route not towards the local loopback device but towards
the network namespaces loopback device.

Before other issues took precedence I made it half way to implementing
that.   The ip input path won't get confused if the destination network
device is not in the same network namespace as the device.  Last I
looked the ip output path still had a few places where confusion was
possible between the network socket and the output device.

As long as installing such routes is conditional upon having
CAP_NET_ADMIN in both network namespaces you should be fine and things
should be very simple and very fast.  Because that won't take a special
case through the network stack.

Given that performance is your primary motive I suspect this will yield
the fastest possible path through the network stack as no extra steps
need to be taken, and can benefit from any routing improvements to the
ordinary network stack.

Eric

Re: [PATCH v2] net: ethernet: aquantia: set net_device mtu when mtu is changed

2017-03-13 Thread David Miller

From: David Arcari 
Date: Mon, 13 Mar 2017 16:30:52 -0400

> Please drop v2 of this patch.  We would like to have v1 applied.

Resubmit v1 again if you want me to do this.

Re: [PATCH net-next 1/1 v2] net: rmnet_data: Initial implementation

2017-03-13 Thread Subash Abhinov Kasiviswanathan


On 2017-03-13 02:54, Jiri Pirko wrote:

Mon, Mar 13, 2017 at 08:43:09AM CET, subas...@codeaurora.org wrote:

RmNet Data driver provides a transport agnostic MAP (multiplexing and


Why "data"? Why not just "rmnet"??

Btw, what is "RmNet". Google does not give me much. Is it some
priprietady Qualcomm thing? Is there some standard behind it?



Hi Jiri

Rm interface is used to describe an application processor tethered to
Qualcomm Technologies, Inc. modems. Since these are netdevices,
the term RmNet is used. I don't think there are published standards
available for this but I can provide relevant information and
document it as well.

rmnet was used for a USB based physical transport earlier, hence
this platform agnostic multiplexing driver was named as rmnet_data.

--
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a 
Linux Foundation Collaborative Project

Re: [PATCH 1/1] r8152: fix NULL pointer dereference in r8152_poll

2017-03-13 Thread Petr Vorel

> > > Unfortunately this doesn't work. Code in r8152.c doesn't use
> > > local_bh_enable()/local_bh_disable(). I tried to lock it with
> > > spin_lock_bh()/spin_unlock_bh() and with mutex_lock()/mutex_unlock()
> > > but neither work.

> > The local_bh_disable() / local_bh_enable() definitely is the right
> > answer to the issue you described.

> > It does not matter what code in r8152.c currently does.

> > https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/commit/?id=8cf699ec849f4ca1413cea01289bd7d37dbcc626


> You also have to protect other napi_schedule(), like the ones in
> rtl_work_func_t() or rtl8152_post_reset()

I've tested that before :-). I'll be more precise what "not working" means: it 
fixes
invalid pointer issue, but kernel crashes for different reason:

 ...
Call Trace:
 
 net_rx_action+0x23c/0x3f0
 __do_softirq+0x104/0x2e1
 ? usb_runtime_suspend+0x70/0x70 [usbcore]
 do_softirq_own_stack+8x1c/0x30
 
 do_softirq.part.18+0x41/0x50
 __local_bh_enable_ip+0x88/0xa0
 rtl8152_resume+0xe2/0x1a0 [r8152]
 usb_resume_interface.isra.60x99/0xf0 [usbcore]
 usb_resume_both+0x6a/0x130 [usbcore]
 __rpm_callback+0xb9/0x1f0
 rpm_callback+Ox5f/0x80
 ? usb_runtime_suspend+0x70/0x70 [usbcore]
 usb_resume+0x495/0x6b0
 ? update_load_avg+Ox79/0x520
 ? update_load_avg+Ox79/0x520
 ? refcount_dec_and_test+0x11/0x20
 __pm_runtime_resume+0x3f/0x60
 usb_autoresume_device+0x23/0x50 [usbcore]
 usb_dev_open+0xe7/0x250 [usbcore]
 chrdev_open+0xa1/0x200
 do_dentry_open+0x20a/0x2f0
 ? cdev_put+0x30/0x30
 vfs_open+0x4c/0x70
 ? may_open+0x9b/0x100
 path_openat+0x5ec/0x1430
 do_filp_open+0x7e/0xe0
 ? __vfs_write+0x28/0x140
 ? __alloc_fd+0xb2/0x160
 do_sys_open+0x123/0x200
 SyS_open+0x1e/0x20
 entry_SYSCALL_64_fastpath+0x1e/0xad
 ...
 Kernel panic - not syncing: Fatal exception in interrupt
 ...

Patch: http://pastebin.com/Uejjc0Bh (I don't post patch here, as it's not 
working).


Kind regards,
Petr

Re: [PATCH net] vxlan: fix ovs support

2017-03-13 Thread Roopa Prabhu

On 3/13/17, 8:24 AM, Nicolas Dichtel wrote:
> The required changes in the function vxlan_dev_create() were missing
> in commit 8bcdc4f3a20b.
> The vxlan device is not registered anymore after this patch and the error
> path causes an stack dump:
>  WARNING: CPU: 3 PID: 1498 at net/core/dev.c:6713 
> rollback_registered_many+0x9d/0x3f0
>
> Fixes: 8bcdc4f3a20b ("vxlan: add changelink support")
> CC: Roopa Prabhu 
> Signed-off-by: Nicolas Dichtel 
> ---
>  
Acked-by: Roopa Prabhu 

Thanks Nicolas.

[PATCH 7/7] ath9k: hw: Reset the device with the external reset before init

2017-03-13 Thread Alban

On the SoC platform the board code often manually reset the device
before registering it. To allow the same to happen on DT platforms
let the driver call the reset before init.

Signed-off-by: Alban 
---
 drivers/net/wireless/ath/ath9k/hw.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/drivers/net/wireless/ath/ath9k/hw.c 
b/drivers/net/wireless/ath/ath9k/hw.c
index efc0435..dfb13bc 100644
--- a/drivers/net/wireless/ath/ath9k/hw.c
+++ b/drivers/net/wireless/ath/ath9k/hw.c
@@ -576,6 +576,13 @@ static int __ath9k_hw_init(struct ath_hw *ah)
struct ath_common *common = ath9k_hw_common(ah);
int r = 0;
 
+   /* Reset the device before using it */
+   r = ath9k_hw_external_reset(ah);
+   if (r) {
+   ath_err(common, "Failed to reset chip\n");
+   return r;
+   }
+
ath9k_hw_read_revisions(ah);
 
switch (ah->hw_version.macVersion) {
-- 
2.7.4

Re: [PATCH] mpls: Do not decrement alive counter for unregister events

2017-03-13 Thread David Ahern

On 3/13/17 5:10 AM, Robert Shearman wrote:
> Doesn't this leave the problem that if the device's link goes down and
> then the device gets deleted the alive count will be decremented twice
> for the same path?

yes. and it exposes another bug in multipath selection.

> 
> Perhaps it would be better to change the condition for decrementing the
> alive count to be "!(nh->nh_flags & (RTNH_F_LINKDOWN | RTNH_F_DEAD))"?

or maybe the logic in mpls_ifup is the way to go -- reset the alive
counter based on the sum of each nexhop's status.

I'll send more patches soon.

Re: [PATCH net-next] mlx4: Better use of order-0 pages in RX path

2017-03-13 Thread Eric Dumazet

On Mon, Mar 13, 2017 at 1:23 PM, Alexei Starovoitov
 wrote:
> On Mon, Mar 13, 2017 at 11:58:05AM -0700, Eric Dumazet wrote:
>> On Mon, Mar 13, 2017 at 11:31 AM, Alexei Starovoitov
>>  wrote:
>> > On Mon, Mar 13, 2017 at 10:50:28AM -0700, Eric Dumazet wrote:
>> >> On Mon, Mar 13, 2017 at 10:34 AM, Alexei Starovoitov
>> >>  wrote:
>> >> > On Sun, Mar 12, 2017 at 05:58:47PM -0700, Eric Dumazet wrote:
>> >> >> @@ -767,10 +814,30 @@ int mlx4_en_process_rx_cq(struct net_device 
>> >> >> *dev, struct mlx4_en_cq *cq, int bud
>> >> >>   case XDP_PASS:
>> >> >>   break;
>> >> >>   case XDP_TX:
>> >> >> + /* Make sure we have one page ready to 
>> >> >> replace this one */
>> >> >> + npage = NULL;
>> >> >> + if (!ring->page_cache.index) {
>> >> >> + npage = mlx4_alloc_page(priv, 
>> >> >> ring,
>> >> >> + , 
>> >> >> numa_mem_id(),
>> >> >> + 
>> >> >> GFP_ATOMIC | __GFP_MEMALLOC);
>> >> >
>> >> > did you test this with xdp2 test ?
>> >> > under what conditions it allocates ?
>> >> > It looks dangerous from security point of view to do allocations here.
>> >> > Can it be exploited by an attacker?
>> >> > we use xdp for ddos and lb and this is fast path.
>> >> > If 1 out of 100s XDP_TX packets hit this allocation we will have serious
>> >> > perf regression.
>> >> > In general I dont think it's a good idea to penalize x86 in favor of 
>> >> > powerpc.
>> >> > Can you #ifdef this new code somehow? so we won't have these concerns 
>> >> > on x86?
>> >>
>> >> Normal paths would never hit this point really. I wanted to be extra
>> >> safe, because who knows, some guys could be tempted to set
>> >> ethtool -G ethX  rx 512 tx 8192
>> >>
>> >> Before this patch, if you were able to push enough frames in TX ring,
>> >> you would also eventually be forced to allocate memory, or drop frames...
>> >
>> > hmm. not following.
>> > Into xdp tx queues packets don't come from stack. It can only be via 
>> > xdp_tx.
>> > So this rx page belongs to driver, not shared with anyone and it only 
>> > needs to
>> > be put onto tx ring, so I don't understand why driver needs to allocating
>> > anything here. To refill the rx ring? but why here?
>>
>> Because RX ring can be shared, by packets goind to the upper stacks (say TCP)
>>
>> So there is no guarantee that the pages in the quarantine pool have
>> their page count to 1.
>>
>> The normal TX_XDP path will recycle pages in ring->cache_page .
>>
>> This is exactly where I pick up a replacement.
>>
>> Pages in ring->cache_page have the guarantee to have no other users
>> than ourself (mlx4 driver)
>>
>> You might have not noticed that current mlx4 driver has a lazy refill
>> of RX ring buffers, that eventually
>> removes all the pages from RX ring, and we have to recover with this
>> lazy mlx4_en_recover_from_oom() thing
>> that will attempt to restart the allocations.
>>
>> After my patch, we have the guarantee that the RX ring buffer is
>> always fully populated.
>>
>> When we receive a frame (XDP or not), we drop it if we can not
>> replenish the RX slot,
>> in case the oldest page in quarantine is not a recycling candidate and
>> we can not allocate a new page.
>
> Got it. Could you please add above explanation to the commit log,
> since it's a huge difference vs other drivers.
> I don't think any other driver in the tree follows this strategy.

sk_buff allocation can fail before considering adding a frag on the
skb anyway...

Look, all this is completely irrelevant for XDP_TX, since there is no
allocation at all,
once ~128 pages have been put into page_cache.

Do you want to prefill this cache when XDP is loaded ?

> I think that's the right call, but it shouldn't be hidden in details.

ring->page_cache contains the pages used by XDP_TX are recycled
through mlx4_en_rx_recycle()

There is a nice comment there. I did not change this part.

These pages _used_ to be directly taken from mlx4_en_prepare_rx_desc(),
because there was no page recycling for the normal path.

All my patch does is a generic implementation
for page recycling, leaving the XDP_TX pages in their own pool,
because we do not even have to check their page count, adding a cache line miss.

So I do have 2 different pools, simply to let XDP_TX path being ultra
fast, as before.

Does not look details to me.


> If that's the right approach (probably is) we should do it
> in the other drivers too.
> Though I don't see you dropping "to the stack" packet in this patch.
> The idea is to drop the packet (whether xdp or not) if rx
> buffer cannot be replinshed _regardless_ whether driver
> implements recycling, right?



>
> Theory vs

[PATCH 3/7] ath9k: Add support for reading the EEPROM data using the nvmem API

2017-03-13 Thread Alban

Currently SoC platforms use a firmware request to get the EEPROM data.
This is mostly a hack and rely on using a user-helper scripts which is
deprecated. A nicer alternative is to use the nvmem API which was
designed for this kind of task.

Furthermore we let CONFIG_ATH9K_AHB select CONFIG_NVMEM as such
devices will generally use this method for loading the EEPROM data.

Signed-off-by: Alban 
---
 drivers/net/wireless/ath/ath9k/Kconfig  |  1 +
 drivers/net/wireless/ath/ath9k/eeprom.c | 10 ++
 drivers/net/wireless/ath/ath9k/hw.h |  2 ++
 drivers/net/wireless/ath/ath9k/init.c   | 21 +
 4 files changed, 34 insertions(+)

diff --git a/drivers/net/wireless/ath/ath9k/Kconfig 
b/drivers/net/wireless/ath/ath9k/Kconfig
index 783a38f..1558c03 100644
--- a/drivers/net/wireless/ath/ath9k/Kconfig
+++ b/drivers/net/wireless/ath/ath9k/Kconfig
@@ -49,6 +49,7 @@ config ATH9K_PCI
 config ATH9K_AHB
bool "Atheros ath9k AHB bus support"
depends on ATH9K
+   select NVMEM
default n
---help---
  This option enables the AHB bus support in ath9k.
diff --git a/drivers/net/wireless/ath/ath9k/eeprom.c 
b/drivers/net/wireless/ath/ath9k/eeprom.c
index fb80ec8..1f28222 100644
--- a/drivers/net/wireless/ath/ath9k/eeprom.c
+++ b/drivers/net/wireless/ath/ath9k/eeprom.c
@@ -127,6 +127,14 @@ static bool ath9k_hw_nvram_read_pdata(struct 
ath9k_platform_data *pdata,
 offset, data);
 }
 
+static bool ath9k_hw_nvram_read_data(struct ath_hw *ah,
+off_t offset, u16 *data)
+{
+   return ath9k_hw_nvram_read_array(ah->eeprom_data,
+ah->eeprom_size / 2,
+offset, data);
+}
+
 static bool ath9k_hw_nvram_read_firmware(const struct firmware *eeprom_blob,
 off_t offset, u16 *data)
 {
@@ -143,6 +151,8 @@ bool ath9k_hw_nvram_read(struct ath_hw *ah, u32 off, u16 
*data)
 
if (ah->eeprom_blob)
ret = ath9k_hw_nvram_read_firmware(ah->eeprom_blob, off, data);
+   else if (ah->eeprom_data)
+   ret = ath9k_hw_nvram_read_data(ah, off, data);
else if (pdata && !pdata->use_eeprom && pdata->eeprom_data)
ret = ath9k_hw_nvram_read_pdata(pdata, off, data);
else
diff --git a/drivers/net/wireless/ath/ath9k/hw.h 
b/drivers/net/wireless/ath/ath9k/hw.h
index 9cbca12..7f17c2a 100644
--- a/drivers/net/wireless/ath/ath9k/hw.h
+++ b/drivers/net/wireless/ath/ath9k/hw.h
@@ -970,6 +970,8 @@ struct ath_hw {
bool disable_5ghz;
 
const struct firmware *eeprom_blob;
+   void *eeprom_data;
+   size_t eeprom_size;
 
struct ath_dynack dynack;
 
diff --git a/drivers/net/wireless/ath/ath9k/init.c 
b/drivers/net/wireless/ath/ath9k/init.c
index fa4b3cc..054f254 100644
--- a/drivers/net/wireless/ath/ath9k/init.c
+++ b/drivers/net/wireless/ath/ath9k/init.c
@@ -22,6 +22,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -511,6 +512,7 @@ static int ath9k_eeprom_request(struct ath_softc *sc, const 
char *name)
 static void ath9k_eeprom_release(struct ath_softc *sc)
 {
release_firmware(sc->sc_ah->eeprom_blob);
+   kfree(sc->sc_ah->eeprom_data);
 }
 
 static int ath9k_init_platform(struct ath_softc *sc)
@@ -654,6 +656,25 @@ static int ath9k_init_softc(u16 devid, struct ath_softc 
*sc,
if (ret)
return ret;
 
+   /* If the EEPROM hasn't been retrieved via firmware request
+* use the nvmem API insted.
+*/
+   if (!ah->eeprom_blob) {
+   struct nvmem_cell *eeprom_cell;
+
+   eeprom_cell = nvmem_cell_get(ah->dev, "eeprom");
+   if (!IS_ERR(eeprom_cell)) {
+   ah->eeprom_data = nvmem_cell_read(
+   eeprom_cell, >eeprom_size);
+   nvmem_cell_put(eeprom_cell);
+
+   if (IS_ERR(ah->eeprom_data)) {
+   dev_err(ah->dev, "failed to read eeprom");
+   return PTR_ERR(ah->eeprom_data);
+   }
+   }
+   }
+
if (ath9k_led_active_high != -1)
ah->config.led_active_high = ath9k_led_active_high == 1;
 
-- 
2.7.4

[PATCH 5/7] ath9k: of: Use the clk API to get the reference clock rate

2017-03-13 Thread Alban

If a clock named "ref" exists use it to get the reference clock rate.

Signed-off-by: Alban 
---
 drivers/net/wireless/ath/ath9k/init.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/drivers/net/wireless/ath/ath9k/init.c 
b/drivers/net/wireless/ath/ath9k/init.c
index 36b51a5..5cb9c61 100644
--- a/drivers/net/wireless/ath/ath9k/init.c
+++ b/drivers/net/wireless/ath/ath9k/init.c
@@ -24,6 +24,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include "ath9k.h"
@@ -564,6 +565,7 @@ static int ath9k_of_init(struct ath_softc *sc)
struct ath_hw *ah = sc->sc_ah;
struct ath_common *common = ath9k_hw_common(ah);
enum ath_bus_type bus_type = common->bus_ops->ath_bus_type;
+   struct clk *clk;
const char *mac;
char eeprom_name[100];
int ret;
@@ -573,6 +575,12 @@ static int ath9k_of_init(struct ath_softc *sc)
 
ath_dbg(common, CONFIG, "parsing configuration from OF node\n");
 
+   clk = clk_get(sc->dev, "ref");
+   if (!IS_ERR(clk)) {
+   ah->is_clk_25mhz = (clk_get_rate(clk) == 2500);
+   clk_put(clk);
+   }
+
if (of_property_read_bool(np, "qca,no-eeprom")) {
/* ath9k-eeprom--.bin */
scnprintf(eeprom_name, sizeof(eeprom_name),
-- 
2.7.4

[PATCH 6/7] ath9k: Allow using the reset API for the external reset

2017-03-13 Thread Alban

From: Alban Bedel 

Allow using the reset API instead of a platform specific callback to
reset the device.

Signed-off-by: Alban Bedel 
---
 drivers/net/wireless/ath/ath9k/hw.c   | 26 +-
 drivers/net/wireless/ath/ath9k/hw.h   |  1 +
 drivers/net/wireless/ath/ath9k/init.c | 10 ++
 3 files changed, 32 insertions(+), 5 deletions(-)

diff --git a/drivers/net/wireless/ath/ath9k/hw.c 
b/drivers/net/wireless/ath/ath9k/hw.c
index 8c5c2dd..efc0435 100644
--- a/drivers/net/wireless/ath/ath9k/hw.c
+++ b/drivers/net/wireless/ath/ath9k/hw.c
@@ -21,6 +21,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include "hw.h"
@@ -551,6 +552,24 @@ static int ath9k_hw_attach_ops(struct ath_hw *ah)
return 0;
 }
 
+static int ath9k_hw_external_reset(struct ath_hw *ah)
+{
+   int ret = 0;
+
+   ath_dbg(ath9k_hw_common(ah), RESET,
+   "reset MAC via external reset\n");
+
+   if (ah->external_reset) {
+   ret = ah->external_reset();
+   } else if (ah->reset) {
+   ret = reset_control_assert(ah->reset);
+   if (!ret)
+   ret = reset_control_deassert(ah->reset);
+   }
+
+   return ret;
+}
+
 /* Called for all hardware families */
 static int __ath9k_hw_init(struct ath_hw *ah)
 {
@@ -1286,14 +1305,11 @@ static bool ath9k_hw_ar9330_reset_war(struct ath_hw 
*ah, int type)
break;
}
 
-   if (ah->external_reset &&
+   if ((ah->reset || ah->external_reset) &&
(npend || type == ATH9K_RESET_COLD)) {
int reset_err = 0;
 
-   ath_dbg(ath9k_hw_common(ah), RESET,
-   "reset MAC via external reset\n");
-
-   reset_err = ah->external_reset();
+   reset_err = ath9k_hw_external_reset(ah);
if (reset_err) {
ath_err(ath9k_hw_common(ah),
"External reset failed, err=%d\n",
diff --git a/drivers/net/wireless/ath/ath9k/hw.h 
b/drivers/net/wireless/ath/ath9k/hw.h
index 7f17c2a..53b67e3 100644
--- a/drivers/net/wireless/ath/ath9k/hw.h
+++ b/drivers/net/wireless/ath/ath9k/hw.h
@@ -966,6 +966,7 @@ struct ath_hw {
bool is_clk_25mhz;
int (*get_mac_revision)(void);
int (*external_reset)(void);
+   struct reset_control *reset;
bool disable_2ghz;
bool disable_5ghz;
 
diff --git a/drivers/net/wireless/ath/ath9k/init.c 
b/drivers/net/wireless/ath/ath9k/init.c
index 5cb9c61..f1cb806 100644
--- a/drivers/net/wireless/ath/ath9k/init.c
+++ b/drivers/net/wireless/ath/ath9k/init.c
@@ -25,6 +25,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include "ath9k.h"
@@ -697,6 +698,15 @@ static int ath9k_init_softc(u16 devid, struct ath_softc 
*sc,
if (!is_valid_ether_addr(common->macaddr))
ath9k_get_nvmem_address(sc);
 
+   /* Try to get a reset controller */
+   ah->reset = devm_reset_control_get_optional(sc->dev, NULL);
+   if (IS_ERR(ah->reset)) {
+   if (PTR_ERR(ah->reset) != -ENOENT &&
+   PTR_ERR(ah->reset) != -ENOTSUPP)
+   return PTR_ERR(ah->reset);
+   ah->reset = NULL;
+   }
+
/* If the EEPROM hasn't been retrieved via firmware request
 * use the nvmem API insted.
 */
-- 
2.7.4

[PATCH 4/7] ath9k: Add support for reading the MAC address with nvmem

2017-03-13 Thread Alban

On embedded platforms the MAC address is often stored in flash,
use nvmem to read it if the platform data or DT didn't specify one.

Signed-off-by: Alban 
---
 drivers/net/wireless/ath/ath9k/init.c | 33 +
 1 file changed, 33 insertions(+)

diff --git a/drivers/net/wireless/ath/ath9k/init.c 
b/drivers/net/wireless/ath/ath9k/init.c
index 054f254..36b51a5 100644
--- a/drivers/net/wireless/ath/ath9k/init.c
+++ b/drivers/net/wireless/ath/ath9k/init.c
@@ -594,6 +594,35 @@ static int ath9k_of_init(struct ath_softc *sc)
return 0;
 }
 
+static int ath9k_get_nvmem_address(struct ath_softc *sc)
+{
+   struct ath_common *common = ath9k_hw_common(sc->sc_ah);
+   struct nvmem_cell *cell;
+   size_t cell_size;
+   int err = 0;
+   void *mac;
+
+   cell = nvmem_cell_get(sc->dev, "address");
+   if (IS_ERR(cell))
+   return PTR_ERR(cell);
+
+   mac = nvmem_cell_read(cell, _size);
+   nvmem_cell_put(cell);
+
+   if (IS_ERR(mac))
+   return PTR_ERR(mac);
+
+   if (cell_size == 6) {
+   ether_addr_copy(common->macaddr, mac);
+   } else {
+   dev_err(sc->dev, "nvmem 'address' cell has invalid size\n");
+   err = -EINVAL;
+   }
+
+   kfree(mac);
+   return err;
+}
+
 static int ath9k_init_softc(u16 devid, struct ath_softc *sc,
const struct ath_bus_ops *bus_ops)
 {
@@ -656,6 +685,10 @@ static int ath9k_init_softc(u16 devid, struct ath_softc 
*sc,
if (ret)
return ret;
 
+   /* If no MAC address has been set yet try to use nvmem */
+   if (!is_valid_ether_addr(common->macaddr))
+   ath9k_get_nvmem_address(sc);
+
/* If the EEPROM hasn't been retrieved via firmware request
 * use the nvmem API insted.
 */
-- 
2.7.4

[PATCH 2/7] ath9k: ahb: Add OF support

2017-03-13 Thread Alban

Allow registering ath9k AHB devices defined in DT. This just add the
compatible strings to allow matching the driver and setting the proper
device ID.

Signed-off-by: Alban 
---
 drivers/net/wireless/ath/ath9k/ahb.c | 47 +---
 1 file changed, 43 insertions(+), 4 deletions(-)

diff --git a/drivers/net/wireless/ath/ath9k/ahb.c 
b/drivers/net/wireless/ath/ath9k/ahb.c
index 2bd982c..36a2645 100644
--- a/drivers/net/wireless/ath/ath9k/ahb.c
+++ b/drivers/net/wireless/ath/ath9k/ahb.c
@@ -18,6 +18,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include "ath9k.h"
 
@@ -49,6 +50,33 @@ static const struct platform_device_id 
ath9k_platform_id_table[] = {
{},
 };
 
+#ifdef CONFIG_OF
+static const struct of_device_id ath_ahb_of_match[] = {
+   {
+   .compatible = "qca,ar9100-wmac",
+   .data = (void *)AR5416_AR9100_DEVID
+   },
+   {
+   .compatible = "qca,ar9330-wmac",
+   .data = (void *)AR9300_DEVID_AR9330
+   },
+   {
+   .compatible = "qca,ar9340-wmac",
+   .data = (void *)AR9300_DEVID_AR9340
+   },
+   {
+   .compatible = "qca,qca9550-wmac",
+   .data = (void *)AR9300_DEVID_QCA955X
+   },
+   {
+   .compatible = "qca,qca9530-wmac",
+   .data = (void *)AR9300_DEVID_AR953X
+   },
+   {},
+};
+MODULE_DEVICE_TABLE(of, ath_ahb_of_match);
+#endif
+
 /* return bus cachesize in 4B word units */
 static void ath_ahb_read_cachesize(struct ath_common *common, int *csz)
 {
@@ -79,10 +107,20 @@ static int ath_ahb_probe(struct platform_device *pdev)
int ret = 0;
struct ath_hw *ah;
char hw_name[64];
+   u16 devid;
 
-   if (!dev_get_platdata(>dev)) {
-   dev_err(>dev, "no platform data specified\n");
-   return -EINVAL;
+   if (id) {
+   devid = id->driver_data;
+   } else {
+   const struct of_device_id *match;
+
+   match = of_match_device(ath_ahb_of_match, >dev);
+   if (!match) {
+   dev_err(>dev, "no device match found\n");
+   return -EINVAL;
+   }
+
+   devid = (u16)(unsigned long)match->data;
}
 
res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
@@ -127,7 +165,7 @@ static int ath_ahb_probe(struct platform_device *pdev)
goto err_free_hw;
}
 
-   ret = ath9k_init_device(id->driver_data, sc, _ahb_bus_ops);
+   ret = ath9k_init_device(devid, sc, _ahb_bus_ops);
if (ret) {
dev_err(>dev, "failed to initialize device\n");
goto err_irq;
@@ -167,6 +205,7 @@ static struct platform_driver ath_ahb_driver = {
.remove = ath_ahb_remove,
.driver = {
.name   = "ath9k",
+   .of_match_table = of_match_ptr(ath_ahb_of_match),
},
.id_table= ath9k_platform_id_table,
 };
-- 
2.7.4

[PATCH 1/7] Documentation: dt: net: Update the ath9k binding for SoC devices

2017-03-13 Thread Alban

The current binding only cover PCI devices so extend it for SoC devices.

Most SoC platforms use an MTD partition for the calibration data
instead of an EEPROM. The qca,no-eeprom property was added to allow
loading the EEPROM content using firmware loading. This new binding
replace this hack with NVMEM cells, so we also mark the qca,no-eeprom
property as deprecated in case anyone ever used it.

Signed-off-by: Alban 
---
 .../devicetree/bindings/net/wireless/qca,ath9k.txt | 41 --
 1 file changed, 38 insertions(+), 3 deletions(-)

diff --git a/Documentation/devicetree/bindings/net/wireless/qca,ath9k.txt 
b/Documentation/devicetree/bindings/net/wireless/qca,ath9k.txt
index b7396c8..61f5f6d 100644
--- a/Documentation/devicetree/bindings/net/wireless/qca,ath9k.txt
+++ b/Documentation/devicetree/bindings/net/wireless/qca,ath9k.txt
@@ -27,16 +27,34 @@ Required properties:
- 0034 for AR9462
- 0036 for AR9565
- 0037 for AR9485
+   For SoC devices the compatible should be "qca,-wmac"
+   and one of the following fallbacks:
+   - "qca,ar9100-wmac"
+   - "qca,ar9330-wmac"
+   - "qca,ar9340-wmac"
+   - "qca,qca9550-wmac"
+   - "qca,qca9530-wmac"
 - reg: Address and length of the register set for the device.
 
+Required properties for SoC devices:
+- interrupt-parent: phandle of the parent interrupt controller.
+- interrupts: Interrupt specifier for the controllers interrupt.
+
 Optional properties:
+- mac-address: See ethernet.txt in the parent directory
+- local-mac-address: See ethernet.txt in the parent directory
+- clock-names: has to be "ref"
+- clocks: phandle of the reference clock
+- resets: phandle of the reset line
+- nvmem-cell-names: has to be "eeprom" and/or "address"
+- nvmem-cells: phandle to the eeprom nvmem cell and/or to the mac address
+   nvmem cell.
+
+Deprecated properties:
 - qca,no-eeprom: Indicates that there is no physical EEPROM connected to the
ath9k wireless chip (in this case the calibration /
EEPROM data will be loaded from userspace using the
kernel firmware loader).
-- mac-address: See ethernet.txt in the parent directory
-- local-mac-address: See ethernet.txt in the parent directory
-
 
 In this example, the node is defined as child node of the PCI controller:
  {
@@ -46,3 +64,20 @@ In this example, the node is defined as child node of the 
PCI controller:
qca,no-eeprom;
};
 };
+
+In this example it is defined as a SoC device:
+   wmac@180c {
+   compatible = "qca,ar9132-wmac", "qca,ar9100-wmac";
+   reg = <0x180c 0x3>;
+
+   interrupt-parent = <>;
+   interrupts = <2>;
+
+   clock-names = "ref";
+   clocks = <>;
+
+   nvmem-cell-names = "eeprom", "address";
+   nvmem-cells = <_eeprom>, <_address>;
+
+   resets = < 22>;
+   };
-- 
2.7.4

Re: [RESEND PATCH -net] cpsw/netcp: cpts depends on posix_timers

2017-03-13 Thread Nicolas Pitre

On Mon, 13 Mar 2017, Nicolas Pitre wrote:

> So unless I'm mistaken I don't see any problem using "depends on 
> PTP_1588_CLOCK" here.

Furthermore that wouldn't be a first. See for example 
PTP_1588_CLOCK_GIANFAR, PTP_1588_CLOCK_IXP46X, DP83640_PHY, etc.


Nicolas

1 2 3 4 >

1 - 100 of 331 matches

Mail list logo