date:20171017

Re: [PATCH 00/58] networking: Convert timers to use timer_setup()

2017-10-17 Thread Kalle Valo

Kees Cook  writes:

> On Tue, Oct 17, 2017 at 7:18 AM, Kalle Valo  wrote:
>> + linux-wireless
>>
>> Hi Kees,
>>
>> Kees Cook  writes:
>>
>>> This is the current set of outstanding networking patches to perform
>>> conversions to the new timer interface (rebased to -next). This is not
>>> all expected conversions, but it contains everything needed in networking
>>> to eliminate init_timer(), and all the non-standard setup_*_timer() uses.
>>
>> So this also includes patches which I had queued for
>> wireless-drivers-next:
>>
>> https://patchwork.kernel.org/patch/9986253/
>> https://patchwork.kernel.org/patch/9986245/
>>
>> And looking at patchwork[1] I have even more timer_setup() related
>> patches from you. It would be really helpful if you could clearly
>> document to which tree you want the patches to be applied. I don't care
>
> Hi! Sorry about that. It's been a bit tricky to juggle everything.

Yeah, I understand.

>> if it's net-next or wireless-drivers-next as long as it's not the both
>> (meaning that both Dave and me apply the same patch, which would be
>> bad). The thing is that I really do not have time to figure out for
>> every patch via which tree it's supposed to go.
>
> Which split is preferred? I had been trying to separate wireless from
> the rest of net (but missed some cases).

So what we try to follow is that I apply all patches for
drivers/net/wireless to my wireless-drivers trees, with exception of
Johannes taking mac80211_hwsim.c patches to his mac80211 trees. And
Johannes of course takes all patches for net/wireless and net/mac80211.

So in general I prefer that I take all drivers/net/wireless patches and
make it obvious for Dave that he can ignore those patches (not mix
wireless-drivers and net patches into same set etc). But like I said,
it's ok to push API changes like these via Dave's net trees as well if
you want (and if Dave is ok with that). The chances of conflicts is low,
and if there are be any those would be easy to fix either by me or Dave.

>> For now I'll just drop all your timer_setup() related patches from my
>> queue and I'll assume Dave will take those. Ok?
>>
>> [1] https://patchwork.kernel.org/project/linux-wireless/list/
>
> I guess I'll wait to see what Dave says.

Ok, I don't drop the patches from my queue quite yet then.

-- 
Kalle Valo

Re: [PATCH v9 00/20] simplify crypto wait for async op

2017-10-17 Thread Gilad Ben-Yossef

On Tue, Oct 17, 2017 at 5:06 PM, Russell King - ARM Linux
 wrote:
> On Sun, Oct 15, 2017 at 10:19:45AM +0100, Gilad Ben-Yossef wrote:
>> Many users of kernel async. crypto services have a pattern of
>> starting an async. crypto op and than using a completion
>> to wait for it to end.
>>
>> This patch set simplifies this common use case in two ways:
>>
>> First, by separating the return codes of the case where a
>> request is queued to a backlog due to the provider being
>> busy (-EBUSY) from the case the request has failed due
>> to the provider being busy and backlogging is not enabled
>> (-EAGAIN).
>>
>> Next, this change is than built on to create a generic API
>> to wait for a async. crypto operation to complete.
>>
>> The end result is a smaller code base and an API that is
>> easier to use and more difficult to get wrong.
>>
>> The patch set was boot tested on x86_64 and arm64 which
>> at the very least tests the crypto users via testmgr and
>> tcrypt but I do note that I do not have access to some
>> of the HW whose drivers are modified nor do I claim I was
>> able to test all of the corner cases.
>>
>> The patch set is based upon linux-next release tagged
>> next-20171013.
>
> Has there been any performance impact analysis of these changes?  I
> ended up with patches for one of the crypto drivers which converted
> its interrupt handling to threaded interrupts being reverted because
> it caused a performance degredation.
>
> Moving code to latest APIs to simplify it is not always beneficial.

I agree with the sentiment but I believe this one is justified.

This patch set basically does 3 things:

1.  Replace one immediate value (-EBUSY) by another (-EAGAIN). Mostly it's just
s/EBUSY/EAGAIN/g. In very few places this resulted very trivial code
changes. I don't
foresee this having any effect on performance.

2. Removal of some conditions and/or conditional jumps that were used to discern
between two different cases which are now now easily tested for by the
different return
value. If at all, this will be an increase in performance, although I
don't expect it to be
noticeable.

3. Replacing a whole bunch of open coded code and data structures
which were pretty much
cut and pasted from the Documentation and therefore identical, with a
single copy thereof.

Every place that I found that deviated slightly from the identical
pattern, it turned out to be
a bug of some sorts and patches for those were sent and accepted already.

So, we might be losing a few inline optimization opportunities but
we're gaining better
cache utilization. Again, I don't expect any of this to have a
noticeable effect to either
direction.

I did run the changed code as best I could and did not notice any
performance changes and
none of the testers and maintainers that ACKed mentioned any.

Having said that, it's a big change that touches many places,
sub-systems and drivers. I do
not claim to have thoroughly tested for performance all the changes in
person. In some cases,
I don't even have access to the specialized hardware. I did get a
reasonable amount of review
and testers I believe - would always love to see more :-)

Many thanks,
Gilad

-- 
Gilad Ben-Yossef
Chief Coffee Drinker

"If you take a class in large-scale robotics, can you end up in a
situation where the homework eats your dog?"
 -- Jean-Baptiste Queru

Re: [Intel-wired-lan] [PATCH] [v2] i40e: avoid 64-bit division where possible

2017-10-17 Thread Nambiar, Amritha

On 10/17/2017 10:33 AM, Alexander Duyck wrote:
> On Tue, Oct 17, 2017 at 8:49 AM, Arnd Bergmann  wrote:
>> The new bandwidth calculation caused a link error on 32-bit
>> architectures, like
>>
>> ERROR: "__aeabi_uldivmod" [drivers/net/ethernet/intel/i40e/i40e.ko] 
>> undefined!
>>
>> The problem is the max_tx_rate calculation that uses 64-bit integers.
>> This is not really necessary since the numbers are in MBit/s so
>> they won't be higher than 4 for the highest support rate, and
>> are guaranteed to not exceed 2^32 in future generations either.
>>
>> Another patch from Alan Brady fixed the link error by adding
>> many calls to do_div(), which makes the code less efficent and
>> less readable than necessary.
>>
>> This changes the representation to 'u32' when dealing with MBit/s
>> and uses div_u64() to convert from u64 numbers in byte/s, reverting
>> parts of Alan's earlier fix that have become obsolete now.
>>

This patch breaks the functionality while converting the rates in
bytes/s provided by tc-layer into the Mbit/s in the driver.
I40E_BW_MBPS_DIVISOR defined in Alan's patch should be used for the
conversion, and not I40E_BW_CREDIT_DIVISOR which does the incorrect
math. I40E_BW_CREDIT_DIVISOR is in place because the device uses credit
rates in values of 50Mbps.

>> Fixes: 2027d4deacb1 ("i40e: Add support setting TC max bandwidth rates")
>> Fixes: 73983b5ae011 ("i40e: fix u64 division usage")
>> Cc: Alan Brady 
>> Signed-off-by: Arnd Bergmann 
> 
> So this patch looks good to me, we just need to test it to verify it
> doesn't break existing functionality. In the meantime if Alan's patch
> has gone through testing we should probably push "i40e: fix u64
> division usage" to Dave so that we can at least fix the linking issues
> on ARM and i386.
> 
> Reviewed-by: Alexander Duyck 
> 
>> ---
>>  drivers/net/ethernet/intel/i40e/i40e.h  |  4 +-
>>  drivers/net/ethernet/intel/i40e/i40e_main.c | 70 
>> +++--
>>  2 files changed, 27 insertions(+), 47 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/intel/i40e/i40e.h 
>> b/drivers/net/ethernet/intel/i40e/i40e.h
>> index c3f13120f3ce..c7aa0c982273 100644
>> --- a/drivers/net/ethernet/intel/i40e/i40e.h
>> +++ b/drivers/net/ethernet/intel/i40e/i40e.h
>> @@ -407,7 +407,7 @@ struct i40e_channel {
>> u8 enabled_tc;
>> struct i40e_aqc_vsi_properties_data info;
>>
>> -   u64 max_tx_rate;
>> +   u32 max_tx_rate; /* in Mbits/s */
>>
>> /* track this channel belongs to which VSI */
>> struct i40e_vsi *parent_vsi;
>> @@ -1101,5 +1101,5 @@ static inline bool i40e_enabled_xdp_vsi(struct 
>> i40e_vsi *vsi)
>>  }
>>
>>  int i40e_create_queue_channel(struct i40e_vsi *vsi, struct i40e_channel 
>> *ch);
>> -int i40e_set_bw_limit(struct i40e_vsi *vsi, u16 seid, u64 max_tx_rate);
>> +int i40e_set_bw_limit(struct i40e_vsi *vsi, u16 seid, u32 max_tx_rate);
>>  #endif /* _I40E_H_ */
>> diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c 
>> b/drivers/net/ethernet/intel/i40e/i40e_main.c
>> index 3ceda140170d..57682cc78508 100644
>> --- a/drivers/net/ethernet/intel/i40e/i40e_main.c
>> +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
>> @@ -5448,17 +5448,16 @@ int i40e_get_link_speed(struct i40e_vsi *vsi)
>>   *
>>   * Helper function to set BW limit for a given VSI
>>   **/
>> -int i40e_set_bw_limit(struct i40e_vsi *vsi, u16 seid, u64 max_tx_rate)
>> +int i40e_set_bw_limit(struct i40e_vsi *vsi, u16 seid, u32 max_tx_rate)
>>  {
>> struct i40e_pf *pf = vsi->back;
>> -   u64 credits = 0;
>> int speed = 0;
>> int ret = 0;
>>
>> speed = i40e_get_link_speed(vsi);
>> if (max_tx_rate > speed) {
>> dev_err(>pdev->dev,
>> -   "Invalid max tx rate %llu specified for VSI seid 
>> %d.",
>> +   "Invalid max tx rate %u specified for VSI seid %d.",
>> max_tx_rate, seid);
>> return -EINVAL;
>> }
>> @@ -5469,13 +5468,12 @@ int i40e_set_bw_limit(struct i40e_vsi *vsi, u16 
>> seid, u64 max_tx_rate)
>> }
>>
>> /* Tx rate credits are in values of 50Mbps, 0 is disabled */
>> -   credits = max_tx_rate;
>> -   do_div(credits, I40E_BW_CREDIT_DIVISOR);
>> -   ret = i40e_aq_config_vsi_bw_limit(>hw, seid, credits,
>> +   ret = i40e_aq_config_vsi_bw_limit(>hw, seid,
>> + max_tx_rate / 
>> I40E_BW_CREDIT_DIVISOR,
>>   I40E_MAX_BW_INACTIVE_ACCUM, NULL);
>> if (ret)
>> dev_err(>pdev->dev,
>> -   "Failed set tx rate (%llu Mbps) for vsi->seid %u, 
>> err %s aq_err %s\n",
>> +   "Failed set tx rate (%u Mbps) for vsi->seid %u, err 
>> %s aq_err %s\n",
>> max_tx_rate, seid, i40e_stat_str(>hw, ret),
>>

RE: [PATCH net v2 2/2] net: fec: Let fec_ptp have its own interrupt routine

2017-10-17 Thread Andy Duan

From: Troy Kisky  Sent: Wednesday, October 18, 
2017 5:34 AM
>>> This is better for code locality and should slightly speed up normal
>interrupts.
>>>
>>> This also allows PPS clock output to start working for i.mx7. This is
>>> because
>>> i.mx7 was already using the limit of 3 interrupts, and needed another.
>>>
>>> Signed-off-by: Troy Kisky 
>>>
>>> ---
>>>
>>> v2: made this change independent of any devicetree change so that old
>>> dtbs continue to work.
>>>
>>> Continue to register ptp clock if interrupt is not found.
>>> ---
>>> drivers/net/ethernet/freescale/fec.h  |  3 +-
>>> drivers/net/ethernet/freescale/fec_main.c | 25 ++
>>> drivers/net/ethernet/freescale/fec_ptp.c  | 82
>>> ++
>>> -
>>> 3 files changed, 65 insertions(+), 45 deletions(-)
>>>
>>> diff --git a/drivers/net/ethernet/freescale/fec.h
>>> b/drivers/net/ethernet/freescale/fec.h
>>> index ede1876a9a19..be56ac1f1ac4 100644
>>> --- a/drivers/net/ethernet/freescale/fec.h
>>> +++ b/drivers/net/ethernet/freescale/fec.h
>>> @@ -582,12 +582,11 @@ struct fec_enet_private {
>>> u64 ethtool_stats[0];
>>> };
>>>
>>> -void fec_ptp_init(struct platform_device *pdev);
>>> +void fec_ptp_init(struct platform_device *pdev, int irq_index);
>>> void fec_ptp_stop(struct platform_device *pdev);  void
>>> fec_ptp_start_cyclecounter(struct net_device *ndev);  int
>>> fec_ptp_set(struct net_device *ndev, struct ifreq *ifr);  int
>>> fec_ptp_get(struct net_device *ndev, struct ifreq *ifr); -uint
>>> fec_ptp_check_pps_event(struct fec_enet_private *fep);
>>>
>>>
>>>
>/**
>>> **/
>>> #endif /* FEC_H */
>>> diff --git a/drivers/net/ethernet/freescale/fec_main.c
>>> b/drivers/net/ethernet/freescale/fec_main.c
>>> index 3dc2d771a222..21afabbc560f 100644
>>> --- a/drivers/net/ethernet/freescale/fec_main.c
>>> +++ b/drivers/net/ethernet/freescale/fec_main.c
>>> @@ -1602,10 +1602,6 @@ fec_enet_interrupt(int irq, void *dev_id)
>>> ret = IRQ_HANDLED;
>>> complete(>mdio_done);
>>> }
>>> -
>>> -   if (fep->ptp_clock)
>>> -   if (fec_ptp_check_pps_event(fep))
>>> -   ret = IRQ_HANDLED;
>>> return ret;
>>> }
>>>
>>> @@ -3325,6 +3321,8 @@ fec_probe(struct platform_device *pdev)
>>> struct device_node *np = pdev->dev.of_node, *phy_node;
>>> int num_tx_qs;
>>> int num_rx_qs;
>>> +   char irq_name[8];
>>> +   int irq_cnt;
>>>
>>> fec_enet_get_queue_num(pdev, _tx_qs, _rx_qs);
>>>
>>> @@ -3465,18 +3463,27 @@ fec_probe(struct platform_device *pdev)
>>> if (ret)
>>> goto failed_reset;
>>>
>>> +   irq_cnt = platform_irq_count(pdev);
>>> +   if (irq_cnt > FEC_IRQ_NUM)
>>> +   irq_cnt = FEC_IRQ_NUM;  /* last for ptp */
>>> +   else if (irq_cnt == 2)
>>> +   irq_cnt = 1;/* last for ptp */
>>> +   else if (irq_cnt <= 0)
>>> +   irq_cnt = 1;/* Let the for loop fail */
>>
>> Don't do like this. Don't suppose pps interrupt is the last one.
>
>
>I don't. If the pps interrupt is named, the named interrupt will be used. If 
>it is
>NOT named, the last interrupt is used, if 2 interrupts, or >3 interrupt are
>provided.
>Otherwise, no pps interrupt is assumed.
>Fortunately this seems to be true currently.
>
If pps interrupt is not named, then it limit the last one is pps.
We cannot get the pps interrupt based on current chip interrupt define, we 
never know the future chip how to define interrupt.
Although your current implementation can work with current chips, but it is not 
really good solution.

>
>> And if irq_cnt is 1 like imx28/imx5x,  the patch will break fec interrupt
>function.
>
>How ?  fec_ptp_init will not be called as bufdesc_ex is 0.
>
Imx28 also support enhanced buffer descriptor,  if define the ptp clock in dts 
then bufdesc_ex also can be 1.

I still suggest to use v1 logic check pps interrupt that need to check irq name.

>
>Also, if only 1 interrupt is provided, it is assumed there is no unnamed pps
>interrupt.
>
>
>>
>> I suggest to use .platform_get_irq_byname() to get pps(ptp) interrupt like
>your v1 logic check.
>>
>>> +
>>> if (fep->bufdesc_ex)
>>> -   fec_ptp_init(pdev);
>>> +   fec_ptp_init(pdev, irq_cnt);
>>>
>>> ret = fec_enet_init(ndev);
>>> if (ret)
>>> goto failed_init;
>>>
>>> -   for (i = 0; i < FEC_IRQ_NUM; i++) {
>>> -   irq = platform_get_irq(pdev, i);
>>> +   for (i = 0; i < irq_cnt; i++) {
>>> +   sprintf(irq_name, "int%d", i);
>>> +   irq = platform_get_irq_byname(pdev, irq_name);
>>> +   if (irq < 0)
>>> +   irq = platform_get_irq(pdev, i);
>>> if (irq < 0) {
>>> -   if (i)
>>> -   break;
>>> ret = irq;
>>> goto failed_irq;
>>> }
>>> diff --git

RE: [patch net v2 1/4] net/sched: Change tc_action refcnt and bindcnt to atomic

2017-10-17 Thread Chris Mi

> -Original Message-
> From: Cong Wang [mailto:xiyou.wangc...@gmail.com]
> Sent: Tuesday, October 17, 2017 11:53 PM
> To: Chris Mi 
> Cc: Linux Kernel Network Developers ; Jamal Hadi
> Salim ; Lucas Bates ; Jiri Pirko
> ; David Miller 
> Subject: Re: [patch net v2 1/4] net/sched: Change tc_action refcnt and
> bindcnt to atomic
> 
> On Mon, Oct 16, 2017 at 6:14 PM, Chris Mi  wrote:
> > I don't think this bug were introduced by above two commits only.
> > Actually, this bug were introduced by several commits, at least the
> following:
> > 1. refcnt and bindcnt are not atomic
> 
> Nope, it is perfectly okay with non-atomic as long as no parallel, and without
> RCU callback they are perfectly serialized by RTNL.
Agree.
> 
> 
> > 2. passing actions using list instead of arrays (I think initially we
> > are using arrays)
> 
> We are discussing patch 1/4, this is patch 2/4, so irrelevant.
Agree.
> 
> 
> > 3. using RCU callbacks
> 
> This introduces problem 1.
I think this patch set only fixes one problem, that's the race and the panic.
What do you mean by problem 1.
> 
> 
> > So instead of blaming the latest commit, it is better to say it is a 
> > pre-git error.
> 
> You are wrong.
OK, you are right. But could I know what's your suggestion for this patch set?
1. reject it?
2. change the "Fixes" as you suggested?
3. something else?

Thanks,
Chris

RE: [patch net v3 2/4] net/sched: Use action array instead of action list as parameter

2017-10-17 Thread Chris Mi



> -Original Message-
> From: Cong Wang [mailto:xiyou.wangc...@gmail.com]
> Sent: Wednesday, October 18, 2017 12:56 AM
> To: Chris Mi 
> Cc: Linux Kernel Network Developers ; Jamal Hadi
> Salim ; Lucas Bates ; Jiri Pirko
> ; David Miller 
> Subject: Re: [patch net v3 2/4] net/sched: Use action array instead of action
> list as parameter
> 
> On Mon, Oct 16, 2017 at 6:20 PM, Chris Mi  wrote:
> > When destroying filters, actions should be destroyed first.
> > The pointers of each action are saved in an array. TC doesn't use the
> > array directly, but put all actions in a doubly linked list and use
> > that list as parameter.
> >
> > There is no problem if each filter has its own actions. But if some
> > filters share the same action, when these filters are destroyed, RCU
> > callback fl_destroy_filter() may be called at the same time. That
> > means the same action's 'struct list_head list'
> > could be manipulated at the same time. It may point to an invalid
> > address so that system will panic.
> 
> So if we remove these RCU callbacks (by adding a sychronize_rcu) this is not a
> problem, right? 
Maybe you are right. But do you think it will cause performance issue, I mean 
it takes
longer time to destroy filters if using synchronize_rcu()?
Or is there any other races than RCU callbacks?
We haven't found them.  This is the only one we found.
> 
> 
> >
> > This patch uses the action array directly to fix this issue.
> >
> > Fixes commit in pre-git era.
> >
> > Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
> 
> This is wrong too. RCU callbacks were introduced very late.

Re: [PATCH v3 net-next] tcp: Remove use of daddr_cache in tracepoint

2017-10-17 Thread Song Liu

> > Remove use of ipv6_pinfo in favor of data in sock_common.
> >
> > Fixes: e086101b150a ("tcp: add a tracepoint for tcp retransmission")
> > Signed-off-by: David Ahern 
> > ---
>
> Reviewed-by: Eric Dumazet 

> Thanks David !

Tested-by: Song Liu

Re: RFC(v2): Audit Kernel Container IDs

2017-10-17 Thread Steve Grubb

On Tuesday, October 17, 2017 1:57:43 PM EDT James Bottomley wrote:
> > > > The idea is that processes spawned into a container would be
> > > > labelled by the container orchestration system.  It's unclear
> > > > what should happen to processes using nsenter after the fact, but
> > > > policy for that should be up to the orchestration system.
> > > 
> > > I'm fine with that. The user space policy can be anything y'all
> > > like.
> > 
> > I think there should be a login event.
> 
> I thought you wanted this for containers?  Container creation doesn't
> have login events.  In an unprivileged orchestration system it may be
> hard to synthetically manufacture them.

I realize this. This work is very similar to problems we've solved 12 years 
ago. We'll figure out what the right name is for it down the road. But the 
concept is the same. If something enters a container, we need to know about 
it. It needs to get tagged and be associated with the container. The way this 
was solved for the loginuid problem was to add a session identifier so that 
new logins of the same loginuid can coexist and we can trace actions back to a 
specific login. I'd think we can apply lessons learned from a while back to 
make container identification act similarly.

-Steve

[PATCH] dql: make dql_init return void

2017-10-17 Thread Stephen Hemminger

dql_init always returned 0, and the only place that uses it
in network core code didn't care about the return value anyway.

Signed-off-by: Stephen Hemminger 
---
 include/linux/dynamic_queue_limits.h | 2 +-
 lib/dynamic_queue_limits.c   | 3 +--
 2 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/include/linux/dynamic_queue_limits.h 
b/include/linux/dynamic_queue_limits.h
index a4be70398ce1..f69f98541953 100644
--- a/include/linux/dynamic_queue_limits.h
+++ b/include/linux/dynamic_queue_limits.h
@@ -98,7 +98,7 @@ void dql_completed(struct dql *dql, unsigned int count);
 void dql_reset(struct dql *dql);
 
 /* Initialize dql state */
-int dql_init(struct dql *dql, unsigned hold_time);
+void dql_init(struct dql *dql, unsigned int hold_time);
 
 #endif /* _KERNEL_ */
 
diff --git a/lib/dynamic_queue_limits.c b/lib/dynamic_queue_limits.c
index f346715e2255..dbe61c4c2a97 100644
--- a/lib/dynamic_queue_limits.c
+++ b/lib/dynamic_queue_limits.c
@@ -127,12 +127,11 @@ void dql_reset(struct dql *dql)
 }
 EXPORT_SYMBOL(dql_reset);
 
-int dql_init(struct dql *dql, unsigned hold_time)
+void dql_init(struct dql *dql, unsigned int hold_time)
 {
dql->max_limit = DQL_MAX_LIMIT;
dql->min_limit = 0;
dql->slack_hold_time = hold_time;
dql_reset(dql);
-   return 0;
 }
 EXPORT_SYMBOL(dql_init);
-- 
2.11.0

[PATCH] net: mac80211: mark expected switch fall-throughs

2017-10-17 Thread Gustavo A. R. Silva

In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

Notice that in some cases I replaced "fall through on else" and
"otherwise fall through" comments with just a "fall through" comment,
which is what GCC is expecting to find.

Signed-off-by: Gustavo A. R. Silva 
---
This code was tested by compilation only (GCC 7.2.0 was used).
Please, verify that the actual intention of the code is to fall through.

 net/mac80211/cfg.c| 3 +++
 net/mac80211/ht.c | 1 +
 net/mac80211/iface.c  | 2 +-
 net/mac80211/mesh.c   | 2 ++
 net/mac80211/mesh_hwmp.c  | 1 +
 net/mac80211/mesh_plink.c | 2 +-
 net/mac80211/mlme.c   | 1 +
 net/mac80211/offchannel.c | 4 ++--
 net/mac80211/tdls.c   | 1 +
 net/mac80211/wme.c| 1 +
 10 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/net/mac80211/cfg.c b/net/mac80211/cfg.c
index a354f19..9bd8bef 100644
--- a/net/mac80211/cfg.c
+++ b/net/mac80211/cfg.c
@@ -573,10 +573,12 @@ static int ieee80211_get_key(struct wiphy *wiphy, struct 
net_device *dev,
case WLAN_CIPHER_SUITE_BIP_CMAC_256:
BUILD_BUG_ON(offsetof(typeof(kseq), ccmp) !=
 offsetof(typeof(kseq), aes_cmac));
+   /* fall through */
case WLAN_CIPHER_SUITE_BIP_GMAC_128:
case WLAN_CIPHER_SUITE_BIP_GMAC_256:
BUILD_BUG_ON(offsetof(typeof(kseq), ccmp) !=
 offsetof(typeof(kseq), aes_gmac));
+   /* fall through */
case WLAN_CIPHER_SUITE_GCMP:
case WLAN_CIPHER_SUITE_GCMP_256:
BUILD_BUG_ON(offsetof(typeof(kseq), ccmp) !=
@@ -2205,6 +2207,7 @@ static int ieee80211_scan(struct wiphy *wiphy,
 * for now fall through to allow scanning only when
 * beaconing hasn't been configured yet
 */
+   /* fall through */
case NL80211_IFTYPE_AP:
/*
 * If the scan has been forced (and the driver supports
diff --git a/net/mac80211/ht.c b/net/mac80211/ht.c
index 41f5e48..e55dabf 100644
--- a/net/mac80211/ht.c
+++ b/net/mac80211/ht.c
@@ -491,6 +491,7 @@ int ieee80211_send_smps_action(struct ieee80211_sub_if_data 
*sdata,
case IEEE80211_SMPS_AUTOMATIC:
case IEEE80211_SMPS_NUM_MODES:
WARN_ON(1);
+   /* fall through */
case IEEE80211_SMPS_OFF:
action_frame->u.action.u.ht_smps.smps_control =
WLAN_HT_SMPS_CONTROL_DISABLED;
diff --git a/net/mac80211/iface.c b/net/mac80211/iface.c
index 13b16f9..435e735 100644
--- a/net/mac80211/iface.c
+++ b/net/mac80211/iface.c
@@ -1633,7 +1633,7 @@ static void ieee80211_assign_perm_addr(struct 
ieee80211_local *local,
goto out_unlock;
}
}
-   /* otherwise fall through */
+   /* fall through */
default:
/* assign a new address if possible -- try n_addresses first */
for (i = 0; i < local->hw.wiphy->n_addresses; i++) {
diff --git a/net/mac80211/mesh.c b/net/mac80211/mesh.c
index 7a76c4a..d29a545 100644
--- a/net/mac80211/mesh.c
+++ b/net/mac80211/mesh.c
@@ -988,8 +988,10 @@ ieee80211_mesh_process_chnswitch(struct 
ieee80211_sub_if_data *sdata,
switch (sdata->vif.bss_conf.chandef.width) {
case NL80211_CHAN_WIDTH_20_NOHT:
sta_flags |= IEEE80211_STA_DISABLE_HT;
+   /* fall through */
case NL80211_CHAN_WIDTH_20:
sta_flags |= IEEE80211_STA_DISABLE_40MHZ;
+   /* fall through */
case NL80211_CHAN_WIDTH_40:
sta_flags |= IEEE80211_STA_DISABLE_VHT;
break;
diff --git a/net/mac80211/mesh_hwmp.c b/net/mac80211/mesh_hwmp.c
index 146ec6c..0e75abf 100644
--- a/net/mac80211/mesh_hwmp.c
+++ b/net/mac80211/mesh_hwmp.c
@@ -1247,6 +1247,7 @@ void mesh_path_tx_root_frame(struct ieee80211_sub_if_data 
*sdata)
break;
case IEEE80211_PROACTIVE_PREQ_WITH_PREP:
flags |= IEEE80211_PREQ_PROACTIVE_PREP_FLAG;
+   /* fall through */
case IEEE80211_PROACTIVE_PREQ_NO_PREP:
interval = ifmsh->mshcfg.dot11MeshHWMPactivePathToRootTimeout;
target_flags |= IEEE80211_PREQ_TO_FLAG |
diff --git a/net/mac80211/mesh_plink.c b/net/mac80211/mesh_plink.c
index e2d00cc..0f6c9ca 100644
--- a/net/mac80211/mesh_plink.c
+++ b/net/mac80211/mesh_plink.c
@@ -672,7 +672,7 @@ void mesh_plink_timer(struct timer_list *t)
break;
}
reason = WLAN_REASON_MESH_MAX_RETRIES;
-   /* fall through on else */
+   /* fall through */
case NL80211_PLINK_CNF_RCVD:
/* confirm timer */
if (!reason)
diff --git a/net/mac80211/mlme.c b/net/mac80211/mlme.c
index

Re: [Intel-wired-lan] [RFC PATCH next 2/2] i40e: add support for macvlan hardware offload

2017-10-17 Thread Shannon Nelson


On 10/17/2017 2:32 PM, Alexander Duyck wrote:


So the select_queue function being needed is the deal breaker on all
of this as far as I am concerned. We aren't allowed to use it under
other cases so why should macvlan be an exception to the rule?


I realize that the stack is pretty good at chosing the "right" queue, 
which is my understanding as to why we shouldn't use select_queue(), but 
it doesn't know how to use the accel_priv context associated with the 
macvlan offload.


I saw DaveM's guidance to the HiNIC folks when they tried to add 
select_queue(): "do not implement this function unless you absolutely 
need to do something custom in your driver".  I can see where this might 
be the exception.


When originally thinking about how to do this, I wanted to use the 
accel_priv as a pointer to the VSI to be used for the offload, then we 
could have multiple queues and use all the VSI specific tuning 
operations that XL710 has available.  It can work when selecting the 
queue, but by the time you get to start_xmit(), you no longer have that 
context and only have the queue number.  You can't do any fancy encoding 
in the queue number because the value has to be within 
dev->num_tx_queues.  Maybe we can add accel_priv to the start_xmit 
interface?  (I can hear the groans already...)


However... for our case, you might be right anyway.  If the stack is 
doing its job at keeping the conversation on the one queue/irq/cpu 
combination, any Tx following the offloaded Rx might already be headed 
for the right Tx queue.  I'll check on that.

I think we should probably look at a different approach for this. For
example why is it we need to use a different transmit path for a
macvlan packet vs any other packet? On the Rx side we get the
advantage of avoiding the software hashing and demux. What do we get
for reserving queues for transmit?


There are a couple of reasons I can think of to keep the Tx on the 
specific queue pair:


- Keep the Tx traffic on the same CPU and irq as the Rx traffic

- Don't let the flow get interrupted, slowed, or otherwise perturbed by 
other traffic flows.


- Allow for adding hardware assisted bandwidth constraints to the 
offloaded flow without bothering the rest of the NIC's traffic


Are these enough to want to guarantee the Tx queue?


My plan for this is to go back and "fix" ixgbe so we can get it away
from having to use the select_queue call for the macvlan offload and
then maybe look at proving a few select NDO operations for allowing
macvlans that are being offloaded to make specific calls into the
hardware to perform tasks as needed.


The ixgbe implementation can certainly be improved.  I think its biggest 
failing is that the rest of the general traffic gets constrained to a 
single queue - no more RSS for load balancing.


sln

[PATCH] net: l2tp: mark expected switch fall-through

2017-10-17 Thread Gustavo A. R. Silva

In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

Notice that in this particular case I replaced the "NOBREAK" comment with
a "fall through" comment, which is what GCC is expecting to find.

Signed-off-by: Gustavo A. R. Silva 
---
This code was tested by compilation only (GCC 7.2.0 was used).

 net/l2tp/l2tp_netlink.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/l2tp/l2tp_netlink.c b/net/l2tp/l2tp_netlink.c
index 7135f46..f517942 100644
--- a/net/l2tp/l2tp_netlink.c
+++ b/net/l2tp/l2tp_netlink.c
@@ -406,7 +406,7 @@ static int l2tp_nl_tunnel_send(struct sk_buff *skb, u32 
portid, u32 seq, int fla
if (nla_put_u16(skb, L2TP_ATTR_UDP_SPORT, 
ntohs(inet->inet_sport)) ||
nla_put_u16(skb, L2TP_ATTR_UDP_DPORT, 
ntohs(inet->inet_dport)))
goto nla_put_failure;
-   /* NOBREAK */
+   /* fall through  */
case L2TP_ENCAPTYPE_IP:
 #if IS_ENABLED(CONFIG_IPV6)
if (np) {
-- 
2.7.4

Re: [PATCH 0/7] Adding permanent config get/set to devlink

2017-10-17 Thread Steve Lin

My apologies - this patchset was intended for net-next; I forgot to
add that to the subject line, though.

Steve

On Tue, Oct 17, 2017 at 4:44 PM, Steve Lin  wrote:
> DIFFERENCES FROM RFC:
> Implemented most of the changes suggested by Jiri and others.
> Thanks for the valuable feedback!
>
> Adds a devlink command for getting & setting permanent
> (persistent / NVRAM) device configuration parameters, and
> enumerates the parameters as nested devlink attributes.
>
> bnxt driver patches make use of these new devlink cmds/
> attributes.
>
> Steve Lin (7):
>   devlink: Add permanent config parameter get/set operations
>   devlink: Adding NPAR permanent config parameters
>   devlink: Adding high level dev perm config params
>   devlink: Adding perm config of link settings
>   devlink: Adding pre-boot permanent config parameters
>   bnxt: Move generic devlink code to new file
>   bnxt: Add devlink support for config get/set
>
>  drivers/net/ethernet/broadcom/bnxt/Makefile   |   2 +-
>  drivers/net/ethernet/broadcom/bnxt/bnxt.c |   1 +
>  drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c | 363 
> ++
>  drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.h |  56 
>  drivers/net/ethernet/broadcom/bnxt/bnxt_hsi.h | 100 ++
>  drivers/net/ethernet/broadcom/bnxt/bnxt_vfr.c |  53 +---
>  drivers/net/ethernet/broadcom/bnxt/bnxt_vfr.h |  37 +--
>  include/net/devlink.h |   4 +
>  include/uapi/linux/devlink.h  | 113 +++
>  net/core/devlink.c| 300 ++
>  10 files changed, 944 insertions(+), 85 deletions(-)
>  create mode 100644 drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c
>  create mode 100644 drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.h
>
> --
> 2.7.4
>

Re: using verifier to ensure a BPF program uses certain metadata?

2017-10-17 Thread Alexei Starovoitov

On Mon, Oct 16, 2017 at 09:38:44AM +0200, Johannes Berg wrote:
> Hi,
> 
> As we discussed in April already (it's really been that long...), I'd
> wanted to allow using BPF to filter wireless monitor frames, to enable
> new use cases and higher performance in monitoring. I have some code,
> at
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next.git/log/?h=bpf

bpf bits looks pretty straightforward.
attach looks fine too. I'm assuming there is some rtnl or other lock,
so multiple assigns cannot race?
It's missing query interface though.
Please add support to return prog_id.

> which implements parts of this. It's still missing the TX status path
> and perhaps associated metadata, but that part is easy.
> 
> The bigger "problem" is that we're going to be adding support for
> devices that have 802.11->Ethernet conversion already in hardware, and
> in that case the notion that the filter program will get an 802.11
> header to look at is no longer right.
> 
> Now, most likely for the actual in-service monitoring we'll actually
> have to reconstitute the 802.11 header on the fly (in pure monitoring
> where nothing else is active, we can just disable the conversion), but
> the filtering shouldn't really be reliant on that, since that's not the
> cheapest thing to do.
> 
> The obvious idea around this is to add a metadata field (just a bit
> really), something like "is_data_ethernet", saying that it was both a
> data frame and is already converted to have an Ethernet header.
> However, since these devices don't really exist yet for the vast
> majority of people, I'm a bit afraid that we'll find later a lot of
> code simply ignoring this field and looking at the "802.11" header,
> which is then broken if it encounters an Ethernet header instead.
> 
> Are there lies my question: If we added a new callback to
> bpf_verifier_ops (e.g. "post_verifier_check"), to be called after the
> normal verification, and also added a context argument to
> "is_valid_access" (*), we could easily track that this new metadata
> field is accessed, and reject programs that don't access it at all.
> 
> Now, I realize that people could trivially just work around this in
> their program if they wanted, but I think most will take the reminder
> and just implement
> 
> if (ctx->is_data_ethernet)
> return DROP_FRAME;
> 
> instead, since mostly data frames will not be very relevant to them.
> 
> What do you think?

sounds fine and considering new verifier ops after Jakub refactoring
a check that is_data_ethernet was accessed would fit nicely.
Without void** hack.

Cycling Enthusiasts List

2017-10-17 Thread Greg Elmassian



Hi,

Hope all's well,

Would you be interested in acquiring an email list of “ Cycling Enthusiasts 
List ” from USA?

Each record in the list contains Contact Name (First, Middle and Last Name), 
Mailing Address, List type and Opt-in email address.

All the contacts are opt-in verified, 100% permission based and can be used for 
unlimited multi-channel marketing.

We also have data for:

(1)Motorcycle Owners List  (2)RV/Boat Owners List
(3)Camping Enthusiasts (4)Spa and Resort Visitors List
(5)Skiers List (6)Harley Davidson Owners List
(7)Travelers List  (8)Health and Fitness Enthusiasts
(9)Sports Enthusiasts List (10)Outdoor /Hiking Enthusiasts List

Let me know if you'd be interested in hearing more about it.

Waiting for your valuable and sincere reply.

Best Regards,
Greg Elmassian

Re: [PATCH net-next 2/3] ipv6: start fib6 gc on RTF_CACHE dst creation

2017-10-17 Thread Martin KaFai Lau

On Tue, Oct 17, 2017 at 06:35:13PM +, Wei Wang wrote:
> On Tue, Oct 17, 2017 at 10:40 AM, Paolo Abeni  wrote:
> > After the commit Fixes: 2b760fcf5cfb ("ipv6: hook up exception
> > table to store dst cache"), the fib6 gc is not started after
> > the creation of a RTF_CACHE via a redirect or pmtu update, since
> > fib6_add() isn't invoked anymore for such dsts.
Nice catch!

Acked-by: Martin KaFai Lau 

> >
> > We need the fib6 gc to run periodically to clean the RTF_CACHE,
> > or the dst will stay there forever.
> >
> > Fix it by explicitly calling fib6_force_start_gc() on successful
> > exception creation. gc_args->more accounting will ensure that
> > the gc timer will run for whatever time needed to properly
> > clean the table.
> >
> > Fixes: 2b760fcf5cfb ("ipv6: hook up exception table to store dst cache")
> > Signed-off-by: Paolo Abeni 
> > ---
> Acked-by: Wei Wang 
> 
> Totally true. Thanks for catching this.
> 
> >  net/ipv6/route.c | 4 +++-
> >  1 file changed, 3 insertions(+), 1 deletion(-)
> >
> > diff --git a/net/ipv6/route.c b/net/ipv6/route.c
> > index 5bb53dbd4fd3..8b25a31b6b03 100644
> > --- a/net/ipv6/route.c
> > +++ b/net/ipv6/route.c
> > @@ -1340,8 +1340,10 @@ static int rt6_insert_exception(struct rt6_info *nrt,
> > spin_unlock_bh(_exception_lock);
> >
> > /* Update fn->fn_sernum to invalidate all cached dst */
> > -   if (!err)
> > +   if (!err) {
> > fib6_update_sernum(ort);
> > +   fib6_force_start_gc(net);
> > +   }
> >
> > return err;
> >  }
> > --
> > 2.13.6
> >

Re: [PATCH net-next 3/3] ipv6: obsolete cached dst when removing them from fib tree

2017-10-17 Thread Martin KaFai Lau

On Tue, Oct 17, 2017 at 06:58:23PM +, Wei Wang wrote:
> On Tue, Oct 17, 2017 at 10:40 AM, Paolo Abeni  wrote:
> > The commit 2b760fcf5cfb ("ipv6: hook up exception table to store
> > dst cache") partially reverted 1e2ea8ad37be ("ipv6: set
> > dst.obsolete when a cached route has expired").
> >
> > This change brings back the dst obsoleting and push it a step
> > farther: cached dst are always obsoleted when removed from the
> > fib tree, and removal by time expiration is now performed
> > regardless of dst->__refcnt, to be consistent with what we
> > already do for RTF_GATEWAY dst.
> >
> > Fixes: 2b760fcf5cfb ("ipv6: hook up exception table to store dst cache")
> > Signed-off-by: Paolo Abeni 
> > ---
> >  net/ipv6/route.c | 13 +++--
> >  1 file changed, 11 insertions(+), 2 deletions(-)
> >
> > diff --git a/net/ipv6/route.c b/net/ipv6/route.c
> > index 8b25a31b6b03..fce740049e3e 100644
> > --- a/net/ipv6/route.c
> > +++ b/net/ipv6/route.c
> > @@ -1147,6 +1147,12 @@ static void rt6_remove_exception(struct 
> > rt6_exception_bucket *bucket,
> > if (!bucket || !rt6_ex)
> > return;
> >
> > +   /* sockets, flow cache, etc. can hold a refence to this dst, be sure
> > +* they will drop it.
> > +*/
> > +   if (rt6_ex->rt6i)
> > +   rt6_ex->rt6i->dst.obsolete = DST_OBSOLETE_FORCE_CHK;
> > +
> 
> Hmm... I don't really think it is needed. rt6 is created with
> rt6->dst.obsolete set to DST_OBSOLETE_FORCE_CHK. And by the time the
> above function is called, it should still be that value.
> Furthermore, the later call rt6_release() calls dst_dev_put() which
> sets rt6->dst.obsolete to DST_OBSOLETE_DEAD to indicate this route has
> been removed from the tree.
> 
> > net = dev_net(rt6_ex->rt6i->dst.dev);
> > rt6_ex->rt6i->rt6i_node = NULL;
> > hlist_del_rcu(_ex->hlist);
> > @@ -1575,8 +1581,11 @@ static void rt6_age_examine_exception(struct 
> > rt6_exception_bucket *bucket,
> >  {
> > struct rt6_info *rt = rt6_ex->rt6i;
> >
> > -   if (atomic_read(>dst.__refcnt) == 1 &&
> > -   time_after_eq(now, rt->dst.lastuse + gc_args->timeout)) {
> > +   /* we are pruning and obsoleting the exception route even if others
> > +* have still reference to it, so that on next dst_check() such
> > +* reference can be dropped
> > +*/
> > +   if (time_after_eq(now, rt->dst.lastuse + gc_args->timeout)) {
> 
> Why do we want to change this behavior? Before my patch series, cached
> routes were only deleted from the tree in fib6_age() when
> rt->dst.__refcnt == 1, isn't it?
In the commit 1e2ea8ad37be ("ipv6: set dst.obsolete when a cached route has 
expired"),
if obsolete is set to DST_OBSOLETE_KILL, why it is not removed from
the tree together?

> 
> > RT6_TRACE("aging clone %p\n", rt);
> > rt6_remove_exception(bucket, rt6_ex);
> > return;
> > --
> > 2.13.6
> >

Re: [PATCH v3 net-next] tcp: Remove use of daddr_cache in tracepoint

2017-10-17 Thread Eric Dumazet

On Tue, 2017-10-17 at 13:09 -0700, David Ahern wrote:
> Running perf in one window to capture tcp_retransmit_skb tracepoint:
> $ perf record -e tcp:tcp_retransmit_skb -a
> 
> And causing a retransmission on an active TCP session (e.g., dropping
> packets in the receiver, changing MTU on the interface to 500 and back
> to 1500) triggers a panic:

> Remove use of ipv6_pinfo in favor of data in sock_common.
> 
> Fixes: e086101b150a ("tcp: add a tracepoint for tcp retransmission")
> Signed-off-by: David Ahern 
> ---

Reviewed-by: Eric Dumazet 

Thanks David !

Re: [PATCH net v2 2/2] net: fec: Let fec_ptp have its own interrupt routine

2017-10-17 Thread Troy Kisky

On 10/15/2017 8:41 PM, Andy Duan wrote:
> From: Troy Kisky  Sent: Saturday, October 14, 
> 2017 10:10 AM
>> This is better for code locality and should slightly speed up normal 
>> interrupts.
>>
>> This also allows PPS clock output to start working for i.mx7. This is because
>> i.mx7 was already using the limit of 3 interrupts, and needed another.
>>
>> Signed-off-by: Troy Kisky 
>>
>> ---
>>
>> v2: made this change independent of any devicetree change so that old dtbs
>> continue to work.
>>
>> Continue to register ptp clock if interrupt is not found.
>> ---
>> drivers/net/ethernet/freescale/fec.h  |  3 +-
>> drivers/net/ethernet/freescale/fec_main.c | 25 ++
>> drivers/net/ethernet/freescale/fec_ptp.c  | 82 ++
>> -
>> 3 files changed, 65 insertions(+), 45 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/freescale/fec.h
>> b/drivers/net/ethernet/freescale/fec.h
>> index ede1876a9a19..be56ac1f1ac4 100644
>> --- a/drivers/net/ethernet/freescale/fec.h
>> +++ b/drivers/net/ethernet/freescale/fec.h
>> @@ -582,12 +582,11 @@ struct fec_enet_private {
>>  u64 ethtool_stats[0];
>> };
>>
>> -void fec_ptp_init(struct platform_device *pdev);
>> +void fec_ptp_init(struct platform_device *pdev, int irq_index);
>> void fec_ptp_stop(struct platform_device *pdev);  void
>> fec_ptp_start_cyclecounter(struct net_device *ndev);  int fec_ptp_set(struct
>> net_device *ndev, struct ifreq *ifr);  int fec_ptp_get(struct net_device 
>> *ndev,
>> struct ifreq *ifr); -uint fec_ptp_check_pps_event(struct fec_enet_private
>> *fep);
>>
>>
>> /**
>> **/
>> #endif /* FEC_H */
>> diff --git a/drivers/net/ethernet/freescale/fec_main.c
>> b/drivers/net/ethernet/freescale/fec_main.c
>> index 3dc2d771a222..21afabbc560f 100644
>> --- a/drivers/net/ethernet/freescale/fec_main.c
>> +++ b/drivers/net/ethernet/freescale/fec_main.c
>> @@ -1602,10 +1602,6 @@ fec_enet_interrupt(int irq, void *dev_id)
>>  ret = IRQ_HANDLED;
>>  complete(>mdio_done);
>>  }
>> -
>> -if (fep->ptp_clock)
>> -if (fec_ptp_check_pps_event(fep))
>> -ret = IRQ_HANDLED;
>>  return ret;
>> }
>>
>> @@ -3325,6 +3321,8 @@ fec_probe(struct platform_device *pdev)
>>  struct device_node *np = pdev->dev.of_node, *phy_node;
>>  int num_tx_qs;
>>  int num_rx_qs;
>> +char irq_name[8];
>> +int irq_cnt;
>>
>>  fec_enet_get_queue_num(pdev, _tx_qs, _rx_qs);
>>
>> @@ -3465,18 +3463,27 @@ fec_probe(struct platform_device *pdev)
>>  if (ret)
>>  goto failed_reset;
>>
>> +irq_cnt = platform_irq_count(pdev);
>> +if (irq_cnt > FEC_IRQ_NUM)
>> +irq_cnt = FEC_IRQ_NUM;  /* last for ptp */
>> +else if (irq_cnt == 2)
>> +irq_cnt = 1;/* last for ptp */
>> +else if (irq_cnt <= 0)
>> +irq_cnt = 1;/* Let the for loop fail */
> 
> Don't do like this. Don't suppose pps interrupt is the last one.


I don't. If the pps interrupt is named, the named interrupt will be used. If it 
is NOT
named, the last interrupt is used, if 2 interrupts, or >3 interrupt are 
provided.
Otherwise, no pps interrupt is assumed.
Fortunately this seems to be true currently.


> And if irq_cnt is 1 like imx28/imx5x,  the patch will break fec interrupt 
> function.

How ?  fec_ptp_init will not be called as bufdesc_ex is 0.


Also, if only 1 interrupt is provided, it is assumed there is no unnamed pps 
interrupt.


> 
> I suggest to use .platform_get_irq_byname() to get pps(ptp) interrupt like 
> your v1 logic check.
> 
>> +
>>  if (fep->bufdesc_ex)
>> -fec_ptp_init(pdev);
>> +fec_ptp_init(pdev, irq_cnt);
>>
>>  ret = fec_enet_init(ndev);
>>  if (ret)
>>  goto failed_init;
>>
>> -for (i = 0; i < FEC_IRQ_NUM; i++) {
>> -irq = platform_get_irq(pdev, i);
>> +for (i = 0; i < irq_cnt; i++) {
>> +sprintf(irq_name, "int%d", i);
>> +irq = platform_get_irq_byname(pdev, irq_name);
>> +if (irq < 0)
>> +irq = platform_get_irq(pdev, i);
>>  if (irq < 0) {
>> -if (i)
>> -break;
>>  ret = irq;
>>  goto failed_irq;
>>  }
>> diff --git a/drivers/net/ethernet/freescale/fec_ptp.c
>> b/drivers/net/ethernet/freescale/fec_ptp.c
>> index 6ebad3fac81d..3abeee0d16dd 100644
>> --- a/drivers/net/ethernet/freescale/fec_ptp.c
>> +++ b/drivers/net/ethernet/freescale/fec_ptp.c
>> @@ -549,6 +549,37 @@ static void fec_time_keep(struct work_struct *work)
>>  schedule_delayed_work(>time_keep, HZ);  }
>>
>> +/* This function checks the pps event and reloads the timer compare
>> +counter. */ static irqreturn_t fec_ptp_interrupt(int irq, void *dev_id)
>> +{
>> +struct net_device *ndev =

Re: [Intel-wired-lan] [RFC PATCH next 2/2] i40e: add support for macvlan hardware offload

2017-10-17 Thread Alexander Duyck

On Tue, Oct 17, 2017 at 2:18 PM, Shannon Nelson
 wrote:
> This patch adds support for macvlan hardware offload (l2-fwd-offload)
> feature using the XL710's macvlan-to-queue filtering machanism.  These
> are most useful for supporting separate mac addresses for Container
> virtualization using Docker and similar configurations.
>
> The basic design is to partition off some of the PF's general LAN queues
> outside of the standard RSS pool and use them as the offload queues.
> This especially makes sense on machines with more than 64 CPUs: since
> the RSS pool is limited to a maximum of 64, the queues assigned to the
> remaining CPUs essentially go unused.  When on a machine with fewer than
> 64 CPUs, we shrink the RSS pool and use the upper queues for the offload.
>
> If the user has added Flow Director filters, enabling of macvlan offload
> is disallowed.
>
> To use this feature, use ethtool to enable l2-fwd-offload
> ethtool -K ethX l2-fwd-offload on
> When the next macvlan devices are created on ethX, the macvlan driver
> will automatically attempt to setup the hardweare offload.
>
> Signed-off-by: Shannon Nelson 
> ---
>  drivers/net/ethernet/intel/i40e/i40e.h |   10 +
>  drivers/net/ethernet/intel/i40e/i40e_ethtool.c |   15 ++
>  drivers/net/ethernet/intel/i40e/i40e_main.c|  239 
> +++-
>  drivers/net/ethernet/intel/i40e/i40e_txrx.h|1 +
>  4 files changed, 264 insertions(+), 1 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/i40e/i40e.h 
> b/drivers/net/ethernet/intel/i40e/i40e.h
> index a187f53..4868ae2 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e.h
> +++ b/drivers/net/ethernet/intel/i40e/i40e.h
> @@ -365,6 +365,10 @@ struct i40e_pf {
> u8 atr_sample_rate;
> bool wol_en;
>
> +   u16 macvlan_hint;
> +   u16 macvlan_used;
> +   u16 macvlan_num;
> +
> struct hlist_head fdir_filter_list;
> u16 fdir_pf_active_filters;
> unsigned long fd_flush_timestamp;
> @@ -712,6 +716,12 @@ struct i40e_netdev_priv {
> struct i40e_vsi *vsi;
>  };
>
> +struct i40e_fwd {
> +   struct net_device *vdev;
> +   u16 tx_base_queue;
> +   /* future expansion here might include number of queues */
> +};
> +
>  /* struct that defines an interrupt vector */
>  struct i40e_q_vector {
> struct i40e_vsi *vsi;
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c 
> b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
> index afd3ca8..e1628c1 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
> @@ -3817,6 +3817,13 @@ static int i40e_set_rxnfc(struct net_device *netdev, 
> struct ethtool_rxnfc *cmd)
> struct i40e_pf *pf = vsi->back;
> int ret = -EOPNOTSUPP;
>
> +   if (pf->macvlan_num) {
> +   dev_warn(>pdev->dev,
> +"Remove %d remaining macvlan offloads to change 
> filter options\n",
> +pf->macvlan_used);
> +   return -EBUSY;
> +   }
> +
> switch (cmd->cmd) {
> case ETHTOOL_SRXFH:
> ret = i40e_set_rss_hash_opt(pf, cmd);
> @@ -3909,6 +3916,14 @@ static int i40e_set_channels(struct net_device *dev,
> if (count > i40e_max_channels(vsi))
> return -EINVAL;
>
> +   /* verify that macvlan offloads are not in use */
> +   if (pf->macvlan_num) {
> +   dev_warn(>pdev->dev,
> +"Remove %d remaining macvlan offloads to change 
> channel count\n",
> +pf->macvlan_used);
> +   return -EBUSY;
> +   }
> +
> /* verify that the number of channels does not invalidate any current
>  * flow director rules
>  */
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c 
> b/drivers/net/ethernet/intel/i40e/i40e_main.c
> index e4b8a4b..7b26c6f 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_main.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
> @@ -9221,6 +9221,66 @@ static void i40e_clear_rss_lut(struct i40e_vsi *vsi)
>  }
>
>  /**
> + * i40e_fix_features - fix the proposed netdev feature flags
> + * @netdev: ptr to the netdev being adjusted
> + * @features: the feature set that the stack is suggesting
> + * Note: expects to be called while under rtnl_lock()
> + **/
> +static netdev_features_t i40e_fix_features(struct net_device *netdev,
> +  netdev_features_t features)
> +{
> +   struct i40e_netdev_priv *np = netdev_priv(netdev);
> +   struct i40e_pf *pf = np->vsi->back;
> +   struct i40e_vsi *vsi = np->vsi;
> +
> +   /* make sure there are queues to be used for macvlan offload */
> +   if (features & NETIF_F_HW_L2FW_DOFFLOAD &&
> +   !(netdev->features & NETIF_F_HW_L2FW_DOFFLOAD)) {
> +   const u8 drop = I40E_FILTER_PROGRAM_DESC_DEST_DROP_PACKET;
> +

[RFC PATCH next 1/2] i40e: add ToQueue specific handling for mac filters

2017-10-17 Thread Shannon Nelson

Add the concept of queue-specific filters to the filter handling.  This
will be used in the near future for macvlan offload filters.  In
general, filters for standard use will use a queue of 0, which we'll
take to mean the filter applies to the whole VSI.  Only the filters for
macvlan offload will use a non-zero queue.

Signed-off-by: Shannon Nelson 
---
 drivers/net/ethernet/intel/i40e/i40e.h |   17 +++--
 drivers/net/ethernet/intel/i40e/i40e_debugfs.c |4 +-
 drivers/net/ethernet/intel/i40e/i40e_main.c|   72 ---
 drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c |   10 ++--
 4 files changed, 63 insertions(+), 40 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e.h 
b/drivers/net/ethernet/intel/i40e/i40e.h
index 18c453a..a187f53 100644
--- a/drivers/net/ethernet/intel/i40e/i40e.h
+++ b/drivers/net/ethernet/intel/i40e/i40e.h
@@ -539,14 +539,17 @@ struct i40e_pf {
 /**
  * i40e_mac_to_hkey - Convert a 6-byte MAC Address to a u64 hash key
  * @macaddr: the MAC Address as the base key
+ * @queue: if non-zero, the queue to receive packets with this mac address
  *
  * Simply copies the address and returns it as a u64 for hashing
  **/
-static inline u64 i40e_addr_to_hkey(const u8 *macaddr)
+static inline u64 i40e_addr_to_hkey(const u8 *macaddr, u16 queue)
 {
u64 key = 0;
+   u16 *k = (u16 *)
 
ether_addr_copy((u8 *), macaddr);
+   k[3] = queue;
return key;
 }
 
@@ -563,6 +566,7 @@ struct i40e_mac_filter {
u8 macaddr[ETH_ALEN];
 #define I40E_VLAN_ANY -1
s16 vlan;
+   u16 queue;
enum i40e_filter_state state;
 };
 
@@ -892,10 +896,11 @@ int i40e_add_del_fdir(struct i40e_vsi *vsi,
 u32 i40e_get_global_fd_count(struct i40e_pf *pf);
 bool i40e_set_ntuple(struct i40e_pf *pf, netdev_features_t features);
 void i40e_set_ethtool_ops(struct net_device *netdev);
-struct i40e_mac_filter *i40e_add_filter(struct i40e_vsi *vsi,
-   const u8 *macaddr, s16 vlan);
+struct i40e_mac_filter *i40e_add_filter(struct i40e_vsi *vsi, const u8 
*macaddr,
+   s16 vlan, u16 queue);
 void __i40e_del_filter(struct i40e_vsi *vsi, struct i40e_mac_filter *f);
-void i40e_del_filter(struct i40e_vsi *vsi, const u8 *macaddr, s16 vlan);
+void i40e_del_filter(struct i40e_vsi *vsi, const u8 *macaddr,
+s16 vlan, u16 queue);
 int i40e_sync_vsi_filters(struct i40e_vsi *vsi);
 struct i40e_vsi *i40e_vsi_setup(struct i40e_pf *pf, u8 type,
u16 uplink, u32 param1);
@@ -971,8 +976,8 @@ static inline void i40e_irq_dynamic_enable(struct i40e_vsi 
*vsi, int vector)
 void i40e_rm_vlan_all_mac(struct i40e_vsi *vsi, s16 vid);
 void i40e_vsi_kill_vlan(struct i40e_vsi *vsi, u16 vid);
 struct i40e_mac_filter *i40e_add_mac_filter(struct i40e_vsi *vsi,
-   const u8 *macaddr);
-int i40e_del_mac_filter(struct i40e_vsi *vsi, const u8 *macaddr);
+   const u8 *macaddr, u16 queue);
+int i40e_del_mac_filter(struct i40e_vsi *vsi, const u8 *macaddr, u16 queue);
 bool i40e_is_vsi_in_vlan(struct i40e_vsi *vsi);
 struct i40e_mac_filter *i40e_find_mac(struct i40e_vsi *vsi, const u8 *macaddr);
 void i40e_vlan_stripping_enable(struct i40e_vsi *vsi);
diff --git a/drivers/net/ethernet/intel/i40e/i40e_debugfs.c 
b/drivers/net/ethernet/intel/i40e/i40e_debugfs.c
index 6f2725f..cf173e1 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_debugfs.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_debugfs.c
@@ -171,8 +171,8 @@ static void i40e_dbg_dump_vsi_seid(struct i40e_pf *pf, int 
seid)
 pf->hw.mac.port_addr);
hash_for_each(vsi->mac_filter_hash, bkt, f, hlist) {
dev_info(>pdev->dev,
-"mac_filter_hash: %pM vid=%d, state %s\n",
-f->macaddr, f->vlan,
+"mac_filter_hash: %pM vid=%d q=%d, state %s\n",
+f->macaddr, f->vlan, f->queue,
 i40e_filter_state_string[f->state]);
}
dev_info(>pdev->dev, "active_filters %u, promisc_threshold %u, 
overflow promisc %s\n",
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c 
b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 84c5087..e4b8a4b 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -1114,11 +1114,13 @@ void i40e_update_stats(struct i40e_vsi *vsi)
  * @vsi: the VSI to be searched
  * @macaddr: the MAC address
  * @vlan: the vlan
+ * @queue: the queue
  *
  * Returns ptr to the filter object or NULL
  **/
 static struct i40e_mac_filter *i40e_find_filter(struct i40e_vsi *vsi,
-   const u8 *macaddr, s16 vlan)
+   const u8 *macaddr, s16 vlan,
+

[RFC PATCH next 2/2] i40e: add support for macvlan hardware offload

2017-10-17 Thread Shannon Nelson

This patch adds support for macvlan hardware offload (l2-fwd-offload)
feature using the XL710's macvlan-to-queue filtering machanism.  These
are most useful for supporting separate mac addresses for Container
virtualization using Docker and similar configurations.

The basic design is to partition off some of the PF's general LAN queues
outside of the standard RSS pool and use them as the offload queues.
This especially makes sense on machines with more than 64 CPUs: since
the RSS pool is limited to a maximum of 64, the queues assigned to the
remaining CPUs essentially go unused.  When on a machine with fewer than
64 CPUs, we shrink the RSS pool and use the upper queues for the offload.

If the user has added Flow Director filters, enabling of macvlan offload
is disallowed.

To use this feature, use ethtool to enable l2-fwd-offload
ethtool -K ethX l2-fwd-offload on
When the next macvlan devices are created on ethX, the macvlan driver
will automatically attempt to setup the hardweare offload.

Signed-off-by: Shannon Nelson 
---
 drivers/net/ethernet/intel/i40e/i40e.h |   10 +
 drivers/net/ethernet/intel/i40e/i40e_ethtool.c |   15 ++
 drivers/net/ethernet/intel/i40e/i40e_main.c|  239 +++-
 drivers/net/ethernet/intel/i40e/i40e_txrx.h|1 +
 4 files changed, 264 insertions(+), 1 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e.h 
b/drivers/net/ethernet/intel/i40e/i40e.h
index a187f53..4868ae2 100644
--- a/drivers/net/ethernet/intel/i40e/i40e.h
+++ b/drivers/net/ethernet/intel/i40e/i40e.h
@@ -365,6 +365,10 @@ struct i40e_pf {
u8 atr_sample_rate;
bool wol_en;
 
+   u16 macvlan_hint;
+   u16 macvlan_used;
+   u16 macvlan_num;
+
struct hlist_head fdir_filter_list;
u16 fdir_pf_active_filters;
unsigned long fd_flush_timestamp;
@@ -712,6 +716,12 @@ struct i40e_netdev_priv {
struct i40e_vsi *vsi;
 };
 
+struct i40e_fwd {
+   struct net_device *vdev;
+   u16 tx_base_queue;
+   /* future expansion here might include number of queues */
+};
+
 /* struct that defines an interrupt vector */
 struct i40e_q_vector {
struct i40e_vsi *vsi;
diff --git a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c 
b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
index afd3ca8..e1628c1 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
@@ -3817,6 +3817,13 @@ static int i40e_set_rxnfc(struct net_device *netdev, 
struct ethtool_rxnfc *cmd)
struct i40e_pf *pf = vsi->back;
int ret = -EOPNOTSUPP;
 
+   if (pf->macvlan_num) {
+   dev_warn(>pdev->dev,
+"Remove %d remaining macvlan offloads to change filter 
options\n",
+pf->macvlan_used);
+   return -EBUSY;
+   }
+
switch (cmd->cmd) {
case ETHTOOL_SRXFH:
ret = i40e_set_rss_hash_opt(pf, cmd);
@@ -3909,6 +3916,14 @@ static int i40e_set_channels(struct net_device *dev,
if (count > i40e_max_channels(vsi))
return -EINVAL;
 
+   /* verify that macvlan offloads are not in use */
+   if (pf->macvlan_num) {
+   dev_warn(>pdev->dev,
+"Remove %d remaining macvlan offloads to change 
channel count\n",
+pf->macvlan_used);
+   return -EBUSY;
+   }
+
/* verify that the number of channels does not invalidate any current
 * flow director rules
 */
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c 
b/drivers/net/ethernet/intel/i40e/i40e_main.c
index e4b8a4b..7b26c6f 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -9221,6 +9221,66 @@ static void i40e_clear_rss_lut(struct i40e_vsi *vsi)
 }
 
 /**
+ * i40e_fix_features - fix the proposed netdev feature flags
+ * @netdev: ptr to the netdev being adjusted
+ * @features: the feature set that the stack is suggesting
+ * Note: expects to be called while under rtnl_lock()
+ **/
+static netdev_features_t i40e_fix_features(struct net_device *netdev,
+  netdev_features_t features)
+{
+   struct i40e_netdev_priv *np = netdev_priv(netdev);
+   struct i40e_pf *pf = np->vsi->back;
+   struct i40e_vsi *vsi = np->vsi;
+
+   /* make sure there are queues to be used for macvlan offload */
+   if (features & NETIF_F_HW_L2FW_DOFFLOAD &&
+   !(netdev->features & NETIF_F_HW_L2FW_DOFFLOAD)) {
+   const u8 drop = I40E_FILTER_PROGRAM_DESC_DEST_DROP_PACKET;
+   struct i40e_fdir_filter *rule;
+   struct hlist_node *node2;
+   u16 rss, unused;
+
+   /* Find a set of queues to be used for macvlan offload.
+* If there aren't many queues outside of the RSS set
+* that could be used for

[RFC PATCH next 0/2] Add support for macvlan offload

2017-10-17 Thread Shannon Nelson

The XL710 and family was originally designed as a device to support the
growing "cloud" networking needs.  With its large number of queues,
filters, VFs, and other features, it can be a very handy device for
sorting traffic in a variety of ways.  However, one early design point
was to support macvlan offloads, and this was never really worked out;
as the Intel group knows, this has bothered me for a rather long time.

The original intent was to use a separate VSI for each macvlan offloaded.
This would make multiple queues and various other features available for
the new pseudo-device.  Unfortunately, there are 2 problems with this
approach: (1) the interraction between the stack and the driver makes it
hard to figure out which VSI:queue pair to transmit through, and (2) there
are a lot more queues available for offload duties than there are VSIs.

Using a simpler design, we can partition off some of the queues in the
PF's primary VSI and use the XL710's macaddr-to-queue filtering capability
to make a large number of macvlan offload channels available.

This RFC is with code that has been shown to get packets in and out of the
right queues, but has gone through very little testing.  In the spirit
of fail fast, I wanted to get this out quickly for comments and get the
rework cycle started.

Shannon Nelson (2):
  i40e: add ToQueue specific handling for mac filters
  i40e: add support for macvlan hardware offload

 drivers/net/ethernet/intel/i40e/i40e.h |   27 ++-
 drivers/net/ethernet/intel/i40e/i40e_debugfs.c |4 +-
 drivers/net/ethernet/intel/i40e/i40e_ethtool.c |   15 +
 drivers/net/ethernet/intel/i40e/i40e_main.c|  311 ++--
 drivers/net/ethernet/intel/i40e/i40e_txrx.h|1 +
 drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c |   10 +-
 6 files changed, 327 insertions(+), 41 deletions(-)

Re: [PATCH net-next 3/3] ipv6: obsolete cached dst when removing them from fib tree

2017-10-17 Thread Wei Wang

On Tue, Oct 17, 2017 at 1:02 PM, Paolo Abeni  wrote:
> On Tue, 2017-10-17 at 11:58 -0700, Wei Wang wrote:
>> On Tue, Oct 17, 2017 at 10:40 AM, Paolo Abeni  wrote:
>> > The commit 2b760fcf5cfb ("ipv6: hook up exception table to store
>> > dst cache") partially reverted 1e2ea8ad37be ("ipv6: set
>> > dst.obsolete when a cached route has expired").
>> >
>> > This change brings back the dst obsoleting and push it a step
>> > farther: cached dst are always obsoleted when removed from the
>> > fib tree, and removal by time expiration is now performed
>> > regardless of dst->__refcnt, to be consistent with what we
>> > already do for RTF_GATEWAY dst.
>> >
>> > Fixes: 2b760fcf5cfb ("ipv6: hook up exception table to store dst cache")
>> > Signed-off-by: Paolo Abeni 
>> > ---
>> >  net/ipv6/route.c | 13 +++--
>> >  1 file changed, 11 insertions(+), 2 deletions(-)
>> >
>> > diff --git a/net/ipv6/route.c b/net/ipv6/route.c
>> > index 8b25a31b6b03..fce740049e3e 100644
>> > --- a/net/ipv6/route.c
>> > +++ b/net/ipv6/route.c
>> > @@ -1147,6 +1147,12 @@ static void rt6_remove_exception(struct 
>> > rt6_exception_bucket *bucket,
>> > if (!bucket || !rt6_ex)
>> > return;
>> >
>> > +   /* sockets, flow cache, etc. can hold a refence to this dst, be 
>> > sure
>> > +* they will drop it.
>> > +*/
>> > +   if (rt6_ex->rt6i)
>> > +   rt6_ex->rt6i->dst.obsolete = DST_OBSOLETE_FORCE_CHK;
>> > +
>>
>> Hmm... I don't really think it is needed. rt6 is created with
>> rt6->dst.obsolete set to DST_OBSOLETE_FORCE_CHK. And by the time the
>> above function is called, it should still be that value.
>> Furthermore, the later call rt6_release() calls dst_dev_put() which
>> sets rt6->dst.obsolete to DST_OBSOLETE_DEAD to indicate this route has
>> been removed from the tree.
>
> You are right, this looks as not needed, if we keep the chunck below.
>
>> > net = dev_net(rt6_ex->rt6i->dst.dev);
>> > rt6_ex->rt6i->rt6i_node = NULL;
>> > hlist_del_rcu(_ex->hlist);
>> > @@ -1575,8 +1581,11 @@ static void rt6_age_examine_exception(struct 
>> > rt6_exception_bucket *bucket,
>> >  {
>> > struct rt6_info *rt = rt6_ex->rt6i;
>> >
>> > -   if (atomic_read(>dst.__refcnt) == 1 &&
>> > -   time_after_eq(now, rt->dst.lastuse + gc_args->timeout)) {
>> > +   /* we are pruning and obsoleting the exception route even if others
>> > +* have still reference to it, so that on next dst_check() such
>> > +* reference can be dropped
>> > +*/
>> > +   if (time_after_eq(now, rt->dst.lastuse + gc_args->timeout)) {
>>
>> Why do we want to change this behavior? Before my patch series, cached
>> routes were only deleted from the tree in fib6_age() when
>> rt->dst.__refcnt == 1, isn't it?
>
> yes, but that really looks like a relic from ancient past more than
> something really needed. We already remove from the dst from fib tree
> regardless of the refcnt if the gateway validation fails - a few lines
> below in the same function.
>
> Waiting for __refcnt going down will let the kernel keep the exception
> entry around for much longer - potentially forever, if e.g. we have a
> reference in a socket dst cache and the application stops processing
> packets.
>

True. If the socket is idle and doesn't send/receive packets,
dst_check() won't get triggered and the socket will keep holding
refcnt on the obsolete dst.

> Meanwhile others sockets may grab more references to (and use) the same
> aged-out dst.
>
I don't think other sockets could grab more reference to this dst
because this dst should already be removed from the fib6 tree.

> The commit 1e2ea8ad37be ("ipv6: set dst.obsolete when a cached route
> has expired") was the solution to the above issue prior to the recent
> refactor.
>

I don't really understand how this commit is solving the above issue.
This commit still only ages out cached route if >dst.__refcnt ==
1. So if socket is holding refcnt to this dst and dst_check() is not
getting called,  this cached route still won't get deleted.

> Cheers,
>
> Paolo

[PATCH 1/7] devlink: Add permanent config parameter get/set operations

2017-10-17 Thread Steve Lin

Add support for permanent config parameter get/set commands. Used
for parameters held in NVRAM, persistent device configuration.
The config_get() and config_set() operations operate as expected, but
note that the driver implementation of the config_set() operation can
indicate whether a restart is necessary for the setting to take
effect.  This indication of a necessary restart is passed via the
DEVLINK_ATTR_PERM_CFG_RESTART_REQUIRED attribute.

First set of parameters defined are PCI SR-IOV and per-VF
configuration:

DEVLINK_ATTR_PERM_CFG_SRIOV_ENABLED: Enable SR-IOV capability.
DEVLINK_ATTR_PERM_CFG_NUM_VF_PER_PF: Maximum number of VFs per PF, in
SR-IOV mode.
DEVLINK_ATTR_PERM_CFG_MAX_NUM_PF_MSIX_VECT: Maximum number of
MSI-X vectors assigned per PF.
DEVLINK_ATTR_PERM_CFG_MSIX_VECTORS_PER_VF: Number of MSI-X vectors
allocated per VF.

Signed-off-by: Steve Lin 
Acked-by: Andy Gospodarek 
---
 include/net/devlink.h|   4 +
 include/uapi/linux/devlink.h |  20 
 net/core/devlink.c   | 266 +++
 3 files changed, 290 insertions(+)

diff --git a/include/net/devlink.h b/include/net/devlink.h
index b9654e1..952966c 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -270,6 +270,10 @@ struct devlink_ops {
int (*eswitch_inline_mode_set)(struct devlink *devlink, u8 inline_mode);
int (*eswitch_encap_mode_get)(struct devlink *devlink, u8 
*p_encap_mode);
int (*eswitch_encap_mode_set)(struct devlink *devlink, u8 encap_mode);
+   int (*config_get)(struct devlink *devlink, enum devlink_attr attr,
+ u32 *value);
+   int (*config_set)(struct devlink *devlink, enum devlink_attr attr,
+ u32 value, u8 *restart_reqd);
 };
 
 static inline void *devlink_priv(struct devlink *devlink)
diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
index 0cbca96..34de44d 100644
--- a/include/uapi/linux/devlink.h
+++ b/include/uapi/linux/devlink.h
@@ -70,6 +70,9 @@ enum devlink_command {
DEVLINK_CMD_DPIPE_HEADERS_GET,
DEVLINK_CMD_DPIPE_TABLE_COUNTERS_SET,
 
+   DEVLINK_CMD_CONFIG_GET,
+   DEVLINK_CMD_CONFIG_SET,
+
/* add new commands above here */
__DEVLINK_CMD_MAX,
DEVLINK_CMD_MAX = __DEVLINK_CMD_MAX - 1
@@ -202,6 +205,23 @@ enum devlink_attr {
 
DEVLINK_ATTR_ESWITCH_ENCAP_MODE,/* u8 */
 
+   /* Permanent Configuration Parameters */
+   DEVLINK_ATTR_PERM_CFG,  /* nested */
+
+   /* When config doesn't take effect until next reboot (config
+* just changed NVM which isn't read until boot, for example),
+* this attribute should be set by the driver.
+*/
+   DEVLINK_ATTR_PERM_CFG_RESTART_REQUIRED, /* u8 */
+   DEVLINK_ATTR_PERM_CFG_SRIOV_ENABLED,/* u8 */
+   DEVLINK_ATTR_PERM_CFG_FIRST = DEVLINK_ATTR_PERM_CFG_SRIOV_ENABLED,
+   DEVLINK_ATTR_PERM_CFG_NUM_VF_PER_PF,/* u32 */
+   DEVLINK_ATTR_PERM_CFG_MAX_NUM_PF_MSIX_VECT, /* u32 */
+   DEVLINK_ATTR_PERM_CFG_MSIX_VECTORS_PER_VF,  /* u32 */
+
+   /* Add new permanent config parameters above here */
+   DEVLINK_ATTR_PERM_CFG_LAST = DEVLINK_ATTR_PERM_CFG_MSIX_VECTORS_PER_VF,
+
/* add new attributes above here, update the policy in devlink.c */
 
__DEVLINK_ATTR_MAX,
diff --git a/net/core/devlink.c b/net/core/devlink.c
index 7d430c1..427a65e 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -1566,6 +1566,254 @@ static int devlink_nl_cmd_eswitch_set_doit(struct 
sk_buff *skb,
return 0;
 }
 
+static const struct nla_policy devlink_nl_policy[DEVLINK_ATTR_MAX + 1];
+
+static int devlink_nl_sing_param_get(struct sk_buff *msg,
+struct devlink *devlink,
+enum devlink_attr attr)
+{
+   struct nla_policy policy;
+   u32 value;
+   int err;
+   const struct devlink_ops *ops = devlink->ops;
+
+   policy = devlink_nl_policy[attr];
+   err = ops->config_get(devlink, attr, );
+   if (err)
+   return err;
+
+   switch (policy.type) {
+   case NLA_U8:
+   err = nla_put_u8(msg, attr, value);
+   break;
+   case NLA_U16:
+   err = nla_put_u16(msg, attr, value);
+   break;
+   case NLA_U32:
+   err = nla_put_u32(msg, attr, value);
+   break;
+   default:
+   return -EINVAL;
+   }
+
+   return 0;
+}
+
+static int devlink_nl_config_get_fill(struct sk_buff *msg,
+ struct devlink *devlink,
+ enum devlink_command cmd,
+ struct genl_info *info)
+{
+   const struct devlink_ops *ops = devlink->ops;
+   void *hdr;
+   int err;
+   enum devlink_attr

[PATCH 5/7] devlink: Adding pre-boot permanent config parameters

2017-10-17 Thread Steve Lin

Extending DEVLINK_ATTR_PERM_CFG (permanent/NVRAM device configuration)
to include some pre-boot device configuration settings:

DEVLINK_ATTR_PERM_CFG_MBA_ENABLED: 1 to enable Multiple Boot
Agent (BMA), 0 to disable.

DEVLINK_ATTR_PERM_CFG_MBA_BOOT_TYPE: Controls mechanism MBA will
use to insert itself into the list of devices recognized by the
BIOS; use enum devlink_mba_boot_type.

DEVLINK_ATTR_PERM_CFG_MBA_DELAY_TIME: Controls how long MBA
banner display and ability to enter MBA setup will persist
during initialization, in seconds.

DEVLINK_ATTR_PERM_CFG_MBA_SETUP_HOT_KEY: Configures which hot-key
will be used to enter MBA setup; use enum devlink_mba_setup_hot_key.

DEVLINK_ATTR_PERM_CFG_MBA_HIDE_SETUP_PROMPT: 1 to enable hiding
of 'enter setup' prompt during initialization, 0 to disable.

DEVLINK_ATTR_PERM_CFG_MBA_BOOT_RETRY_COUNT: MBA retries booting
this number of times, if it fails initially.

DEVLINK_ATTR_PERM_CFG_MBA_VLAN_ENABLED: 1 to enable using VLAN
when executing MBA host software (PXE/iSCSI), 0 to disable.

DEVLINK_ATTR_PERM_CFG_MBA_VLAN_TAG: The 16 bit VLAN tag to use
if MBA_VLAN_ENABLED is set.

DEVLINK_ATTR_PERM_CFG_MBA_BOOT_PROTOCOL: Selects MBA boot
protocol; use enum devlink_mba_boot_protocol.

DEVLINK_ATTR_PERM_CFG_MBA_LINK_SPEED: Configured link speed
while executing MBA host software (PXI/iSCSI); use enum
devlink_mba_link_speed.

Signed-off-by: Steve Lin 
Acked-by: Andy Gospodarek 
---
 include/uapi/linux/devlink.h | 39 ++-
 net/core/devlink.c   | 10 ++
 2 files changed, 48 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
index 2e1c006..609784a 100644
--- a/include/uapi/linux/devlink.h
+++ b/include/uapi/linux/devlink.h
@@ -162,6 +162,33 @@ enum devlink_pre_os_link_speed {
DEVLINK_PRE_OS_LINK_SPEED_100M = 0xf,
 };
 
+enum devlink_mba_boot_type {
+   DEVLINK_MBA_BOOT_TYPE_AUTO_DETECT,
+   DEVLINK_MBA_BOOT_TYPE_BBS,  /* BIOS Boot Specification */
+   DEVLINK_MBA_BOOT_TYPE_INTR18,   /* Hook interrupt 0x18 */
+   DEVLINK_MBA_BOOT_TYPE_INTR19,   /* Hook interrupt 0x19 */
+};
+
+enum devlink_mba_setup_hot_key {
+   DEVLINK_MBA_SETUP_HOT_KEY_CTRL_S,
+   DEVLINK_MBA_SETUP_HOT_KEY_CTRL_B,
+};
+
+enum devlink_mba_boot_protocol {
+   DEVLINK_MBA_BOOT_PROTOCOL_PXE,
+   DEVLINK_MBA_BOOT_PROTOCOL_ISCSI,
+   DEVLINK_MBA_BOOT_PROTOCOL_NONE = 0x7,
+};
+
+enum devlink_mba_link_speed {
+   DEVLINK_MBA_LINK_SPEED_AUTONEG,
+   DEVLINK_MBA_LINK_SPEED_1G,
+   DEVLINK_MBA_LINK_SPEED_10G,
+   DEVLINK_MBA_LINK_SPEED_25G,
+   DEVLINK_MBA_LINK_SPEED_40G,
+   DEVLINK_MBA_LINK_SPEED_50G,
+};
+
 enum devlink_attr {
/* don't change the order or add anything between, this is ABI! */
DEVLINK_ATTR_UNSPEC,
@@ -274,9 +301,19 @@ enum devlink_attr {
DEVLINK_ATTR_PERM_CFG_PHY_SELECT,   /* u8 */
DEVLINK_ATTR_PERM_CFG_PRE_OS_LINK_SPEED_D0, /* u32 */
DEVLINK_ATTR_PERM_CFG_PRE_OS_LINK_SPEED_D3, /* u32 */
+   DEVLINK_ATTR_PERM_CFG_MBA_ENABLED,  /* u8 */
+   DEVLINK_ATTR_PERM_CFG_MBA_BOOT_TYPE,/* u32 */
+   DEVLINK_ATTR_PERM_CFG_MBA_DELAY_TIME,   /* u32 */
+   DEVLINK_ATTR_PERM_CFG_MBA_SETUP_HOT_KEY,/* u32 */
+   DEVLINK_ATTR_PERM_CFG_MBA_HIDE_SETUP_PROMPT,/* u8 */
+   DEVLINK_ATTR_PERM_CFG_MBA_BOOT_RETRY_COUNT, /* u32 */
+   DEVLINK_ATTR_PERM_CFG_MBA_VLAN_ENABLED, /* u8 */
+   DEVLINK_ATTR_PERM_CFG_MBA_VLAN_TAG, /* u16 */
+   DEVLINK_ATTR_PERM_CFG_MBA_BOOT_PROTOCOL,/* u32 */
+   DEVLINK_ATTR_PERM_CFG_MBA_LINK_SPEED,   /* u32 */
 
/* Add new permanent config parameters above here */
-   DEVLINK_ATTR_PERM_CFG_LAST = DEVLINK_ATTR_PERM_CFG_PRE_OS_LINK_SPEED_D3,
+   DEVLINK_ATTR_PERM_CFG_LAST = DEVLINK_ATTR_PERM_CFG_MBA_LINK_SPEED,
 
/* add new attributes above here, update the policy in devlink.c */
 
diff --git a/net/core/devlink.c b/net/core/devlink.c
index 80a2a50..2eaa566 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -2567,6 +2567,16 @@ static const struct nla_policy 
devlink_nl_policy[DEVLINK_ATTR_MAX + 1] = {
[DEVLINK_ATTR_PERM_CFG_PHY_SELECT] = { .type = NLA_U8 },
[DEVLINK_ATTR_PERM_CFG_PRE_OS_LINK_SPEED_D0] = { .type = NLA_U32 },
[DEVLINK_ATTR_PERM_CFG_PRE_OS_LINK_SPEED_D3] = { .type = NLA_U32 },
+   [DEVLINK_ATTR_PERM_CFG_MBA_ENABLED] = { .type = NLA_U8 },
+   [DEVLINK_ATTR_PERM_CFG_MBA_BOOT_TYPE] = { .type = NLA_U32 },
+   [DEVLINK_ATTR_PERM_CFG_MBA_DELAY_TIME] = { .type = NLA_U32 },
+   [DEVLINK_ATTR_PERM_CFG_MBA_SETUP_HOT_KEY] = { .type = NLA_U32 },
+   [DEVLINK_ATTR_PERM_CFG_MBA_HIDE_SETUP_PROMPT] = { .type = NLA_U8 },
+   [DEVLINK_ATTR_PERM_CFG_MBA_BOOT_RETRY_COUNT] = { .type = NLA_U32 },
+

[PATCH 6/7] bnxt: Move generic devlink code to new file

2017-10-17 Thread Steve Lin

Moving generic devlink code (registration) out of VR-R code
into new bnxt_devlink file.

Signed-off-by: Steve Lin 
Acked-by: Andy Gospodarek 
---
 drivers/net/ethernet/broadcom/bnxt/Makefile   |  2 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt.c |  1 +
 drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c | 65 +++
 drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.h | 39 ++
 drivers/net/ethernet/broadcom/bnxt/bnxt_vfr.c | 53 ++
 drivers/net/ethernet/broadcom/bnxt/bnxt_vfr.h | 37 ++---
 6 files changed, 112 insertions(+), 85 deletions(-)
 create mode 100644 drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c
 create mode 100644 drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.h

diff --git a/drivers/net/ethernet/broadcom/bnxt/Makefile 
b/drivers/net/ethernet/broadcom/bnxt/Makefile
index 457201f..59c8ec9 100644
--- a/drivers/net/ethernet/broadcom/bnxt/Makefile
+++ b/drivers/net/ethernet/broadcom/bnxt/Makefile
@@ -1,4 +1,4 @@
 obj-$(CONFIG_BNXT) += bnxt_en.o
 
-bnxt_en-y := bnxt.o bnxt_sriov.o bnxt_ethtool.o bnxt_dcb.o bnxt_ulp.o 
bnxt_xdp.o bnxt_vfr.o
+bnxt_en-y := bnxt.o bnxt_sriov.o bnxt_ethtool.o bnxt_dcb.o bnxt_ulp.o 
bnxt_xdp.o bnxt_vfr.o bnxt_devlink.o
 bnxt_en-$(CONFIG_BNXT_FLOWER_OFFLOAD) += bnxt_tc.o
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c 
b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 5ba4993..52cc38d 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -61,6 +61,7 @@
 #include "bnxt_xdp.h"
 #include "bnxt_vfr.h"
 #include "bnxt_tc.h"
+#include "bnxt_devlink.h"
 
 #define BNXT_TX_TIMEOUT(5 * HZ)
 
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c 
b/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c
new file mode 100644
index 000..f3f6aa8
--- /dev/null
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c
@@ -0,0 +1,65 @@
+/* Broadcom NetXtreme-C/E network driver.
+ *
+ * Copyright (c) 2017 Broadcom Limited
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation.
+ */
+
+#include 
+#include 
+#include "bnxt_hsi.h"
+#include "bnxt.h"
+#include "bnxt_vfr.h"
+#include "bnxt_devlink.h"
+
+static const struct devlink_ops bnxt_dl_ops = {
+#ifdef CONFIG_BNXT_SRIOV
+   .eswitch_mode_set = bnxt_dl_eswitch_mode_set,
+   .eswitch_mode_get = bnxt_dl_eswitch_mode_get,
+#endif /* CONFIG_BNXT_SRIOV */
+};
+
+int bnxt_dl_register(struct bnxt *bp)
+{
+   struct devlink *dl;
+   int rc;
+
+   if (!pci_find_ext_capability(bp->pdev, PCI_EXT_CAP_ID_SRIOV))
+   return 0;
+
+   if (bp->hwrm_spec_code < 0x10800) {
+   netdev_warn(bp->dev, "Firmware does not support SR-IOV E-Switch 
SWITCHDEV mode.\n");
+   return -ENOTSUPP;
+   }
+
+   dl = devlink_alloc(_dl_ops, sizeof(struct bnxt_dl));
+   if (!dl) {
+   netdev_warn(bp->dev, "devlink_alloc failed");
+   return -ENOMEM;
+   }
+
+   bnxt_link_bp_to_dl(bp, dl);
+   bp->eswitch_mode = DEVLINK_ESWITCH_MODE_LEGACY;
+   rc = devlink_register(dl, >pdev->dev);
+   if (rc) {
+   bnxt_link_bp_to_dl(bp, NULL);
+   devlink_free(dl);
+   netdev_warn(bp->dev, "devlink_register failed. rc=%d", rc);
+   return rc;
+   }
+
+   return 0;
+}
+
+void bnxt_dl_unregister(struct bnxt *bp)
+{
+   struct devlink *dl = bp->dl;
+
+   if (!dl)
+   return;
+
+   devlink_unregister(dl);
+   devlink_free(dl);
+}
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.h 
b/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.h
new file mode 100644
index 000..e92a35d
--- /dev/null
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.h
@@ -0,0 +1,39 @@
+/* Broadcom NetXtreme-C/E network driver.
+ *
+ * Copyright (c) 2017 Broadcom Limited
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation.
+ */
+
+#ifndef BNXT_DEVLINK_H
+#define BNXT_DEVLINK_H
+
+/* Struct to hold housekeeping info needed by devlink interface */
+struct bnxt_dl {
+   struct bnxt *bp;/* back ptr to the controlling dev */
+};
+
+static inline struct bnxt *bnxt_get_bp_from_dl(struct devlink *dl)
+{
+   return ((struct bnxt_dl *)devlink_priv(dl))->bp;
+}
+
+/* To clear devlink pointer from bp, pass NULL dl */
+static inline void bnxt_link_bp_to_dl(struct bnxt *bp, struct devlink *dl)
+{
+   bp->dl = dl;
+
+   /* add a back pointer in dl to bp */
+   if (dl) {
+   struct bnxt_dl *bp_dl = devlink_priv(dl);
+
+   bp_dl->bp = bp;
+   }
+}
+
+int bnxt_dl_register(struct bnxt *bp);
+void bnxt_dl_unregister(struct bnxt *bp);

[PATCH 7/7] bnxt: Add devlink support for config get/set

2017-10-17 Thread Steve Lin

Implements get and set of configuration parameters using new devlink
config get/set API.

Signed-off-by: Steve Lin 
Acked-by: Andy Gospodarek 
---
 drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c | 310 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.h |  17 ++
 drivers/net/ethernet/broadcom/bnxt/bnxt_hsi.h | 100 +++
 3 files changed, 421 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c 
b/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c
index f3f6aa8..e247cae 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c
@@ -14,11 +14,309 @@
 #include "bnxt_vfr.h"
 #include "bnxt_devlink.h"
 
-static const struct devlink_ops bnxt_dl_ops = {
+struct bnxt_drv_cfgparam bnxt_drv_cfgparam_list[] = {
+   {DEVLINK_ATTR_PERM_CFG_MAX_NUM_PF_MSIX_VECT, BNXT_DRV_PF,
+   BNXT_DRV_APPL_SHARED, 10, 108},
+   {DEVLINK_ATTR_PERM_CFG_IGNORE_ARI_CAPABILITY, BNXT_DRV_PF,
+   BNXT_DRV_APPL_SHARED, 1, 164},
+   {DEVLINK_ATTR_PERM_CFG_PME_CAPABILITY_ENABLED, BNXT_DRV_PF,
+   BNXT_DRV_APPL_SHARED, 1, 166},
+   {DEVLINK_ATTR_PERM_CFG_LLDP_NEAREST_BRIDGE_ENABLED, BNXT_DRV_PF,
+   BNXT_DRV_APPL_SHARED, 1, 269},
+   {DEVLINK_ATTR_PERM_CFG_LLDP_NEAREST_NONTPMR_BRIDGE_ENABLED, BNXT_DRV_PF,
+   BNXT_DRV_APPL_SHARED, 1, 270},
+   {DEVLINK_ATTR_PERM_CFG_SECURE_NIC_ENABLED, BNXT_DRV_PF,
+   BNXT_DRV_APPL_SHARED, 1, 162},
+   {DEVLINK_ATTR_PERM_CFG_PHY_SELECT, BNXT_DRV_PF,
+   BNXT_DRV_APPL_SHARED, 1, 329},
+   {DEVLINK_ATTR_PERM_CFG_SRIOV_ENABLED, BNXT_DRV_PF,
+   BNXT_DRV_APPL_SHARED, 1, 401},
+
+   {DEVLINK_ATTR_PERM_CFG_MBA_ENABLED, BNXT_DRV_PF,
+   BNXT_DRV_APPL_FUNCTION, 1, 351},
+   {DEVLINK_ATTR_PERM_CFG_MBA_BOOT_TYPE, BNXT_DRV_PF,
+   BNXT_DRV_APPL_FUNCTION, 2, 352},
+   {DEVLINK_ATTR_PERM_CFG_MBA_DELAY_TIME, BNXT_DRV_PF,
+   BNXT_DRV_APPL_FUNCTION, 4, 353},
+   {DEVLINK_ATTR_PERM_CFG_MBA_SETUP_HOT_KEY, BNXT_DRV_PF,
+   BNXT_DRV_APPL_FUNCTION, 1, 354},
+   {DEVLINK_ATTR_PERM_CFG_MBA_HIDE_SETUP_PROMPT, BNXT_DRV_PF,
+   BNXT_DRV_APPL_FUNCTION, 1, 355},
+   {DEVLINK_ATTR_PERM_CFG_MBA_VLAN_TAG, BNXT_DRV_PF,
+   BNXT_DRV_APPL_FUNCTION, 16, 357},
+   {DEVLINK_ATTR_PERM_CFG_MBA_VLAN_ENABLED, BNXT_DRV_PF,
+   BNXT_DRV_APPL_FUNCTION, 1, 358},
+   {DEVLINK_ATTR_PERM_CFG_MBA_LINK_SPEED, BNXT_DRV_PF,
+   BNXT_DRV_APPL_FUNCTION, 4, 359},
+   {DEVLINK_ATTR_PERM_CFG_MBA_BOOT_RETRY_COUNT, BNXT_DRV_PF,
+   BNXT_DRV_APPL_FUNCTION, 3, 360},
+   {DEVLINK_ATTR_PERM_CFG_MBA_BOOT_PROTOCOL, BNXT_DRV_PF,
+   BNXT_DRV_APPL_FUNCTION, 3, 361},
+   {DEVLINK_ATTR_PERM_CFG_NUM_VF_PER_PF, BNXT_DRV_PF,
+   BNXT_DRV_APPL_FUNCTION, 8, 404},
+   {DEVLINK_ATTR_PERM_CFG_MSIX_VECTORS_PER_VF, BNXT_DRV_PF,
+   BNXT_DRV_APPL_FUNCTION, 10, 406},
+   {DEVLINK_ATTR_PERM_CFG_NPAR_BW_RESERVATION, BNXT_DRV_PF,
+   BNXT_DRV_APPL_FUNCTION, 10, 501},
+   {DEVLINK_ATTR_PERM_CFG_NPAR_BW_LIMIT, BNXT_DRV_PF,
+   BNXT_DRV_APPL_FUNCTION, 10, 502},
+   {DEVLINK_ATTR_PERM_CFG_RDMA_ENABLED, BNXT_DRV_PF,
+   BNXT_DRV_APPL_FUNCTION, 1, 506},
+   {DEVLINK_ATTR_PERM_CFG_NPAR_BW_IN_PERCENT, BNXT_DRV_PF,
+   BNXT_DRV_APPL_FUNCTION, 1, 507},
+   {DEVLINK_ATTR_PERM_CFG_NPAR_BW_RESERVATION_VALID, BNXT_DRV_PF,
+   BNXT_DRV_APPL_FUNCTION, 1, 508},
+   {DEVLINK_ATTR_PERM_CFG_NPAR_BW_LIMIT_VALID, BNXT_DRV_PF,
+   BNXT_DRV_APPL_FUNCTION, 1, 509},
+
+   {DEVLINK_ATTR_PERM_CFG_MAGIC_PACKET_WOL_ENABLED, BNXT_DRV_PF,
+   BNXT_DRV_APPL_PORT, 1, 152},
+   {DEVLINK_ATTR_PERM_CFG_DCBX_MODE, BNXT_DRV_PF,
+   BNXT_DRV_APPL_PORT, 4, 155},
+   {DEVLINK_ATTR_PERM_CFG_MULTIFUNC_MODE, BNXT_DRV_PF,
+   BNXT_DRV_APPL_PORT, 5, 157},
+   {DEVLINK_ATTR_PERM_CFG_PRE_OS_LINK_SPEED_D0, BNXT_DRV_PF,
+   BNXT_DRV_APPL_PORT, 4, 205},
+   {DEVLINK_ATTR_PERM_CFG_EEE_PWR_SAVE_ENABLED, BNXT_DRV_PF,
+   BNXT_DRV_APPL_PORT, 1, 208},
+   {DEVLINK_ATTR_PERM_CFG_PRE_OS_LINK_SPEED_D3, BNXT_DRV_PF,
+   BNXT_DRV_APPL_PORT, 4, 210},
+   {DEVLINK_ATTR_PERM_CFG_MEDIA_AUTO_DETECT, BNXT_DRV_PF,
+   BNXT_DRV_APPL_PORT, 1, 213},
+   {DEVLINK_ATTR_PERM_CFG_AUTONEG_PROTOCOL, BNXT_DRV_PF,
+   BNXT_DRV_APPL_PORT, 8, 312},
+   {DEVLINK_ATTR_PERM_CFG_NPAR_NUM_PARTITIONS_PER_PORT, BNXT_DRV_PF,
+   BNXT_DRV_APPL_PORT, 8, 503},
+};
+
+#define BNXT_NUM_DRV_CFGPARAM ARRAY_SIZE(bnxt_drv_cfgparam_list)
+
+static int bnxt_nvm_read(struct bnxt *bp, int nvm_param, int idx,
+void *buf, int size)
+{
+

[PATCH 3/7] devlink: Adding high level dev perm config params

2017-10-17 Thread Steve Lin

Extending DEVLINK_ATTR_PERM_CFG (permanent/NVRAM device configuration)
to include some high level device configuration settings:

DEVLINK_ATTR_PERM_CFG_DCBX_MODE: Data Center Bridging Exchange
(DCBX) protocol mode; use enum devlink_dcbx_mode.

DEVLINK_ATTR_PERM_CFG_RDMA_ENABLED: 1 to enable RDMA, 0 to disable.

DEVLINK_ATTR_PERM_CFG_MULTIFUNC_MODE: Configure multi-function
mode; use devlink_multifunc_mode.

DEVLINK_ATTR_PERM_CFG_SECURE_NIC_ENABLED: 1 to enable Secure NIC
functionality, 0 to disable.

DEVLINK_ATTR_PERM_CFG_IGNORE_ARI_CAPABILITY: 1 to ignore ARI
(Alternate Routing ID) interpretation, 0 to honor ARI.

DEVLINK_ATTR_PERM_CFG_LLDP_NEAREST_BRIDGE_ENABLED: 1 to enable
Link Layer Data Protocol (LLDP) on Nearest Bridge, 0 to
disable.

DEVLINK_ATTR_PERM_CFG_LLDP_NEAREST_NONTPMR_BRIDGE_ENABLED: 1 to
enable Link Layer Data Protocol (LLDP) on Non Two Port MAC
Relay (non-TPMR) Bridge, 0 to disable.

DEVLINK_ATTR_PERM_CFG_PME_CAPABILITY_ENABLED: 1 to enable Power
Management Events (PME) functionality, 0 to disable.

DEVLINK_ATTR_PERM_CFG_MAGIC_PACKET_WOL_ENABLED: 1 to enable
Magic Packet Wake on Lan using ACPI pattern, 0 to disable.

DEVLINK_ATTR_PERM_CFG_EEE_PWR_SAVE_ENABLED: 1 to enable Energy
Efficient Ethernet (EEE), 0 to disable.

Signed-off-by: Steve Lin 
Acked-by: Andy Gospodarek 
---
 include/uapi/linux/devlink.h | 27 ++-
 net/core/devlink.c   | 12 
 2 files changed, 38 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
index 21cfb37..4a9eafd 100644
--- a/include/uapi/linux/devlink.h
+++ b/include/uapi/linux/devlink.h
@@ -127,6 +127,21 @@ enum devlink_eswitch_encap_mode {
DEVLINK_ESWITCH_ENCAP_MODE_BASIC,
 };
 
+enum devlink_dcbx_mode {
+   DEVLINK_DCBX_MODE_DISABLED,
+   DEVLINK_DCBX_MODE_IEEE,
+   DEVLINK_DCBX_MODE_CEE,
+   DEVLINK_DCBX_MODE_IEEE_CEE,
+};
+
+enum devlink_multifunc_mode {
+   DEVLINK_MULTIFUNC_MODE_ALLOWED, /* Ext switch activates MF */
+   DEVLINK_MULTIFUNC_MODE_FORCE_SINGFUNC,
+   DEVLINK_MULTIFUNC_MODE_NPAR10,  /* NPAR 1.0 */
+   DEVLINK_MULTIFUNC_MODE_NPAR15,  /* NPAR 1.5 */
+   DEVLINK_MULTIFUNC_MODE_NPAR20,  /* NPAR 2.0 */
+};
+
 enum devlink_attr {
/* don't change the order or add anything between, this is ABI! */
DEVLINK_ATTR_UNSPEC,
@@ -224,9 +239,19 @@ enum devlink_attr {
DEVLINK_ATTR_PERM_CFG_NPAR_BW_RESERVATION_VALID,/* u8 */
DEVLINK_ATTR_PERM_CFG_NPAR_BW_LIMIT,/* u32 */
DEVLINK_ATTR_PERM_CFG_NPAR_BW_LIMIT_VALID,  /* u8 */
+   DEVLINK_ATTR_PERM_CFG_DCBX_MODE,/* u32 */
+   DEVLINK_ATTR_PERM_CFG_RDMA_ENABLED, /* u8 */
+   DEVLINK_ATTR_PERM_CFG_MULTIFUNC_MODE,   /* u32 */
+   DEVLINK_ATTR_PERM_CFG_SECURE_NIC_ENABLED,   /* u8 */
+   DEVLINK_ATTR_PERM_CFG_IGNORE_ARI_CAPABILITY,/* u8 */
+   DEVLINK_ATTR_PERM_CFG_LLDP_NEAREST_BRIDGE_ENABLED,  /* u8 */
+   DEVLINK_ATTR_PERM_CFG_LLDP_NEAREST_NONTPMR_BRIDGE_ENABLED,  /* u8 */
+   DEVLINK_ATTR_PERM_CFG_PME_CAPABILITY_ENABLED,   /* u8 */
+   DEVLINK_ATTR_PERM_CFG_MAGIC_PACKET_WOL_ENABLED, /* u8 */
+   DEVLINK_ATTR_PERM_CFG_EEE_PWR_SAVE_ENABLED, /* u8 */
 
/* Add new permanent config parameters above here */
-   DEVLINK_ATTR_PERM_CFG_LAST = DEVLINK_ATTR_PERM_CFG_NPAR_BW_LIMIT_VALID,
+   DEVLINK_ATTR_PERM_CFG_LAST = DEVLINK_ATTR_PERM_CFG_EEE_PWR_SAVE_ENABLED,
 
/* add new attributes above here, update the policy in devlink.c */
 
diff --git a/net/core/devlink.c b/net/core/devlink.c
index 76bb6d4..d611154 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -2550,6 +2550,18 @@ static const struct nla_policy 
devlink_nl_policy[DEVLINK_ATTR_MAX + 1] = {
[DEVLINK_ATTR_PERM_CFG_NPAR_BW_RESERVATION_VALID] = { .type = NLA_U8 },
[DEVLINK_ATTR_PERM_CFG_NPAR_BW_LIMIT] = { .type = NLA_U32 },
[DEVLINK_ATTR_PERM_CFG_NPAR_BW_LIMIT_VALID] = { .type = NLA_U8 },
+   [DEVLINK_ATTR_PERM_CFG_DCBX_MODE] = { .type = NLA_U32 },
+   [DEVLINK_ATTR_PERM_CFG_RDMA_ENABLED] = { .type = NLA_U8 },
+   [DEVLINK_ATTR_PERM_CFG_MULTIFUNC_MODE] = { .type = NLA_U32 },
+   [DEVLINK_ATTR_PERM_CFG_SECURE_NIC_ENABLED] = { .type = NLA_U8 },
+   [DEVLINK_ATTR_PERM_CFG_IGNORE_ARI_CAPABILITY] = { .type = NLA_U8 },
+   [DEVLINK_ATTR_PERM_CFG_LLDP_NEAREST_BRIDGE_ENABLED] = {
+   .type = NLA_U8 },
+   [DEVLINK_ATTR_PERM_CFG_LLDP_NEAREST_NONTPMR_BRIDGE_ENABLED] = {
+   .type = NLA_U8 },
+   [DEVLINK_ATTR_PERM_CFG_PME_CAPABILITY_ENABLED] = { .type = NLA_U8 },
+   [DEVLINK_ATTR_PERM_CFG_MAGIC_PACKET_WOL_ENABLED] = { .type = NLA_U8 },
+   [DEVLINK_ATTR_PERM_CFG_EEE_PWR_SAVE_ENABLED] = { .type = NLA_U8 },
 };
 
 static const struct genl_ops devlink_nl_ops[] = {
-- 
2.7.4

[PATCH 0/7] Adding permanent config get/set to devlink

2017-10-17 Thread Steve Lin

DIFFERENCES FROM RFC:
Implemented most of the changes suggested by Jiri and others.
Thanks for the valuable feedback!

Adds a devlink command for getting & setting permanent
(persistent / NVRAM) device configuration parameters, and
enumerates the parameters as nested devlink attributes.

bnxt driver patches make use of these new devlink cmds/
attributes.

Steve Lin (7):
  devlink: Add permanent config parameter get/set operations
  devlink: Adding NPAR permanent config parameters
  devlink: Adding high level dev perm config params
  devlink: Adding perm config of link settings
  devlink: Adding pre-boot permanent config parameters
  bnxt: Move generic devlink code to new file
  bnxt: Add devlink support for config get/set

 drivers/net/ethernet/broadcom/bnxt/Makefile   |   2 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt.c |   1 +
 drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c | 363 ++
 drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.h |  56 
 drivers/net/ethernet/broadcom/bnxt/bnxt_hsi.h | 100 ++
 drivers/net/ethernet/broadcom/bnxt/bnxt_vfr.c |  53 +---
 drivers/net/ethernet/broadcom/bnxt/bnxt_vfr.h |  37 +--
 include/net/devlink.h |   4 +
 include/uapi/linux/devlink.h  | 113 +++
 net/core/devlink.c| 300 ++
 10 files changed, 944 insertions(+), 85 deletions(-)
 create mode 100644 drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c
 create mode 100644 drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.h

-- 
2.7.4

[PATCH 4/7] devlink: Adding perm config of link settings

2017-10-17 Thread Steve Lin

Extending DEVLINK_ATTR_PERM_CFG (permanent/NVRAM device configuration)
to include persistent configuration of device link settings:

DEVLINK_ATTR_PERM_CFG_AUTONEG_PROTOCOL: Configure default autoneg
protocol; use enum devlink_autoneg_protocol.

DEVLINK_ATTR_PERM_CFG_MEDIA_AUTO_DETECT: Configure default
auto-detection of attached media connector (1 = enable, 0 =
disable).

DEVLINK_ATTR_PERM_CFG_PHY_SELECT: Configure default external PHY
selection (0 = PHY 0, 1 = PHY 1).

DEVLINK_ATTR_PERM_CFG_PRE_OS_LINK_SPEED_D0: Configure default
pre-OS link speed in full power (D0) state; use enum
devlink_pre_os_link_speed.

DEVLINK_ATTR_PERM_CFG_PRE_OS_LINK_SPEED_D3: Configure default
pre-OS link speed in sleep (D3) state; use enum
devlink_pre_os_link_speed.

Signed-off-by: Steve Lin 
Acked-by: Andy Gospodarek 
---
 include/uapi/linux/devlink.h | 27 ++-
 net/core/devlink.c   |  5 +
 2 files changed, 31 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
index 4a9eafd..2e1c006 100644
--- a/include/uapi/linux/devlink.h
+++ b/include/uapi/linux/devlink.h
@@ -142,6 +142,26 @@ enum devlink_multifunc_mode {
DEVLINK_MULTIFUNC_MODE_NPAR20,  /* NPAR 2.0 */
 };
 
+enum devlink_autoneg_protocol {
+   DEVLINK_AUTONEG_PROTOCOL_IEEE8023BY_BAM,
+   DEVLINK_AUTONEG_PROTOCOL_IEEE8023BY_CONSORTIUM,
+   DEVLINK_AUTONEG_PROTOCOL_IEEE8023BY,
+   DEVLINK_AUTONEG_PROTOCOL_BAM,   /* Broadcom Autoneg Mode */
+   DEVLINK_AUTONEG_PROTOCOL_CONSORTIUM,/* Consortium Autoneg Mode */
+};
+
+enum devlink_pre_os_link_speed {
+   DEVLINK_PRE_OS_LINK_SPEED_AUTONEG,
+   DEVLINK_PRE_OS_LINK_SPEED_1G,
+   DEVLINK_PRE_OS_LINK_SPEED_10G,
+   DEVLINK_PRE_OS_LINK_SPEED_25G,
+   DEVLINK_PRE_OS_LINK_SPEED_40G,
+   DEVLINK_PRE_OS_LINK_SPEED_50G,
+   DEVLINK_PRE_OS_LINK_SPEED_100G,
+   DEVLINK_PRE_OS_LINK_SPEED_5G = 0xe,
+   DEVLINK_PRE_OS_LINK_SPEED_100M = 0xf,
+};
+
 enum devlink_attr {
/* don't change the order or add anything between, this is ABI! */
DEVLINK_ATTR_UNSPEC,
@@ -249,9 +269,14 @@ enum devlink_attr {
DEVLINK_ATTR_PERM_CFG_PME_CAPABILITY_ENABLED,   /* u8 */
DEVLINK_ATTR_PERM_CFG_MAGIC_PACKET_WOL_ENABLED, /* u8 */
DEVLINK_ATTR_PERM_CFG_EEE_PWR_SAVE_ENABLED, /* u8 */
+   DEVLINK_ATTR_PERM_CFG_AUTONEG_PROTOCOL, /* u32 */
+   DEVLINK_ATTR_PERM_CFG_MEDIA_AUTO_DETECT,/* u8 */
+   DEVLINK_ATTR_PERM_CFG_PHY_SELECT,   /* u8 */
+   DEVLINK_ATTR_PERM_CFG_PRE_OS_LINK_SPEED_D0, /* u32 */
+   DEVLINK_ATTR_PERM_CFG_PRE_OS_LINK_SPEED_D3, /* u32 */
 
/* Add new permanent config parameters above here */
-   DEVLINK_ATTR_PERM_CFG_LAST = DEVLINK_ATTR_PERM_CFG_EEE_PWR_SAVE_ENABLED,
+   DEVLINK_ATTR_PERM_CFG_LAST = DEVLINK_ATTR_PERM_CFG_PRE_OS_LINK_SPEED_D3,
 
/* add new attributes above here, update the policy in devlink.c */
 
diff --git a/net/core/devlink.c b/net/core/devlink.c
index d611154..80a2a50 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -2562,6 +2562,11 @@ static const struct nla_policy 
devlink_nl_policy[DEVLINK_ATTR_MAX + 1] = {
[DEVLINK_ATTR_PERM_CFG_PME_CAPABILITY_ENABLED] = { .type = NLA_U8 },
[DEVLINK_ATTR_PERM_CFG_MAGIC_PACKET_WOL_ENABLED] = { .type = NLA_U8 },
[DEVLINK_ATTR_PERM_CFG_EEE_PWR_SAVE_ENABLED] = { .type = NLA_U8 },
+   [DEVLINK_ATTR_PERM_CFG_AUTONEG_PROTOCOL] = { .type = NLA_U32 },
+   [DEVLINK_ATTR_PERM_CFG_MEDIA_AUTO_DETECT] = { .type = NLA_U8 },
+   [DEVLINK_ATTR_PERM_CFG_PHY_SELECT] = { .type = NLA_U8 },
+   [DEVLINK_ATTR_PERM_CFG_PRE_OS_LINK_SPEED_D0] = { .type = NLA_U32 },
+   [DEVLINK_ATTR_PERM_CFG_PRE_OS_LINK_SPEED_D3] = { .type = NLA_U32 },
 };
 
 static const struct genl_ops devlink_nl_ops[] = {
-- 
2.7.4

[PATCH 2/7] devlink: Adding NPAR permanent config parameters

2017-10-17 Thread Steve Lin

Extending DEVLINK_ATTR_PERM_CFG (permanent/NVRAM device configuration)
to include NPAR settings:

DEVLINK_ATTR_PERM_CFG_NPAR_NUM_PARTITIONS_PER_PORT: Number of NIC
Partitions (NPAR) per port.

DEVLINK_ATTR_PERM_CFG_NPAR_BW_IN_PERCENT: 1 if BW_RESERVATION and
BW_LIMIT is in percent; /0 if BW_RESERVATION and BW_LIMIT is in
100 Mbps units.

DEVLINK_ATTR_PERM_CFG_NPAR_BW_RESERVATION: Configures NPAR bandwidth
or weight reservation, in percent or 100 Mbps units, depending on
BW_IN_PERCENT.

DEVLINK_ATTR_PERM_CFG_NPAR_BW_RESERVATION_VALID: 1 to use
BW_RESERVATION setting, 0 to ignore.

DEVLINK_ATTR_PERM_CFG_NPAR_BW_LIMIT: Configures NPAR bandwidth or
weight limit, in percent or 100 Mbps units, depending on
BW_IN_PERCENT.

DEVLINK_ATTR_PERM_CFG_NPAR_BW_LIMIT_VALID: 1 to use BW_LIMIT
setting, 0 to ignore.

Signed-off-by: Steve Lin 
Acked-by: Andy Gospodarek 
---
 include/uapi/linux/devlink.h | 8 +++-
 net/core/devlink.c   | 7 +++
 2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
index 34de44d..21cfb37 100644
--- a/include/uapi/linux/devlink.h
+++ b/include/uapi/linux/devlink.h
@@ -218,9 +218,15 @@ enum devlink_attr {
DEVLINK_ATTR_PERM_CFG_NUM_VF_PER_PF,/* u32 */
DEVLINK_ATTR_PERM_CFG_MAX_NUM_PF_MSIX_VECT, /* u32 */
DEVLINK_ATTR_PERM_CFG_MSIX_VECTORS_PER_VF,  /* u32 */
+   DEVLINK_ATTR_PERM_CFG_NPAR_NUM_PARTITIONS_PER_PORT, /* u32 */
+   DEVLINK_ATTR_PERM_CFG_NPAR_BW_IN_PERCENT,   /* u32 */
+   DEVLINK_ATTR_PERM_CFG_NPAR_BW_RESERVATION,  /* u32 */
+   DEVLINK_ATTR_PERM_CFG_NPAR_BW_RESERVATION_VALID,/* u8 */
+   DEVLINK_ATTR_PERM_CFG_NPAR_BW_LIMIT,/* u32 */
+   DEVLINK_ATTR_PERM_CFG_NPAR_BW_LIMIT_VALID,  /* u8 */
 
/* Add new permanent config parameters above here */
-   DEVLINK_ATTR_PERM_CFG_LAST = DEVLINK_ATTR_PERM_CFG_MSIX_VECTORS_PER_VF,
+   DEVLINK_ATTR_PERM_CFG_LAST = DEVLINK_ATTR_PERM_CFG_NPAR_BW_LIMIT_VALID,
 
/* add new attributes above here, update the policy in devlink.c */
 
diff --git a/net/core/devlink.c b/net/core/devlink.c
index 427a65e..76bb6d4 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -2543,6 +2543,13 @@ static const struct nla_policy 
devlink_nl_policy[DEVLINK_ATTR_MAX + 1] = {
[DEVLINK_ATTR_PERM_CFG_NUM_VF_PER_PF] = { .type = NLA_U32 },
[DEVLINK_ATTR_PERM_CFG_MAX_NUM_PF_MSIX_VECT] = { .type = NLA_U32 },
[DEVLINK_ATTR_PERM_CFG_MSIX_VECTORS_PER_VF] = { .type = NLA_U32 },
+   [DEVLINK_ATTR_PERM_CFG_NPAR_NUM_PARTITIONS_PER_PORT] = {
+   .type = NLA_U32 },
+   [DEVLINK_ATTR_PERM_CFG_NPAR_BW_IN_PERCENT] = { .type = NLA_U32 },
+   [DEVLINK_ATTR_PERM_CFG_NPAR_BW_RESERVATION] = { .type = NLA_U32 },
+   [DEVLINK_ATTR_PERM_CFG_NPAR_BW_RESERVATION_VALID] = { .type = NLA_U8 },
+   [DEVLINK_ATTR_PERM_CFG_NPAR_BW_LIMIT] = { .type = NLA_U32 },
+   [DEVLINK_ATTR_PERM_CFG_NPAR_BW_LIMIT_VALID] = { .type = NLA_U8 },
 };
 
 static const struct genl_ops devlink_nl_ops[] = {
-- 
2.7.4

Re: [PATCH v3 net-next] tcp: Remove use of daddr_cache in tracepoint

2017-10-17 Thread Cong Wang

On Tue, Oct 17, 2017 at 1:09 PM, David Ahern  wrote:
> Running perf in one window to capture tcp_retransmit_skb tracepoint:
> $ perf record -e tcp:tcp_retransmit_skb -a
>
> And causing a retransmission on an active TCP session (e.g., dropping
> packets in the receiver, changing MTU on the interface to 500 and back
> to 1500) triggers a panic:
>
> [   58.543144] BUG: unable to handle kernel NULL pointer dereference at 
> 0008
> [   58.545300] IP: perf_trace_tcp_retransmit_skb+0xd0/0x145
> [   58.546770] PGD 0 P4D 0
> [   58.547472] Oops:  [#1] SMP
> [   58.548328] Modules linked in: vrf
> [   58.549262] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.0-rc4+ #26
> [   58.551004] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> 1.7.5-20140531_083030-gandalf 04/01/2014
> [   58.554560] task: 81a0e540 task.stack: 81a0
> [   58.555817] RIP: 0010:perf_trace_tcp_retransmit_skb+0xd0/0x145
> [   58.557137] RSP: 0018:88003fc03d68 EFLAGS: 00010282
> [   58.558292] RAX:  RBX: e8c0ec80 RCX: 
> 880038543098
> [   58.559850] RDX: 0400 RSI: 88003fc03d70 RDI: 
> 88003fc14b68
> [   58.561099] RBP: 88003fc03da8 R08:  R09: 
> ead3224a
> [   58.562005] R10: 88003fc03db8 R11: 0010 R12: 
> 8800385428c0
> [   58.562930] R13: e8c0e478 R14: 81a93a40 R15: 
> 88003d4f0c00
> [   58.563845] FS:  () GS:88003fc0() 
> knlGS:
> [   58.564873] CS:  0010 DS:  ES:  CR0: 80050033
> [   58.565613] CR2: 0008 CR3: 3d68f004 CR4: 
> 000606f0
> [   58.566538] Call Trace:
> [   58.566865]  
> [   58.567140]  __tcp_retransmit_skb+0x4ab/0x4c6
> [   58.567704]  ? tcp_set_ca_state+0x22/0x3f
> [   58.568231]  tcp_retransmit_skb+0x14/0xa3
> [   58.568754]  tcp_retransmit_timer+0x472/0x5e3
> [   58.569324]  ? tcp_write_timer_handler+0x1e9/0x1e9
> [   58.569946]  tcp_write_timer_handler+0x95/0x1e9
> [   58.570548]  tcp_write_timer+0x2a/0x58
>
> Remove use of ipv6_pinfo in favor of data in sock_common.
>
> Fixes: e086101b150a ("tcp: add a tracepoint for tcp retransmission")
> Signed-off-by: David Ahern 

Acked-by: Cong Wang

[no subject]

2017-10-17 Thread kelley

<>

[PATCH] mac80211: aggregation: Convert timers to use timer_setup()

2017-10-17 Thread Kees Cook

In preparation for unconditionally passing the struct timer_list pointer to
all timer callbacks, switch to using the new timer_setup() and from_timer()
to pass the timer pointer explicitly.

This removes the tid mapping array and expands the tid structures to
add a pointer back to the station, along with the tid index itself.

Cc: Johannes Berg 
Cc: "David S. Miller" 
Cc: linux-wirel...@vger.kernel.org
Cc: netdev@vger.kernel.org
Signed-off-by: Kees Cook 
---
Resend, with linux-wireless in Cc (no idea how it got missed before)
---
 net/mac80211/agg-rx.c   | 41 +
 net/mac80211/agg-tx.c   | 42 --
 net/mac80211/sta_info.c |  8 
 net/mac80211/sta_info.h | 12 ++--
 4 files changed, 43 insertions(+), 60 deletions(-)

diff --git a/net/mac80211/agg-rx.c b/net/mac80211/agg-rx.c
index 88cc1ae935ea..63aba6dbc92a 100644
--- a/net/mac80211/agg-rx.c
+++ b/net/mac80211/agg-rx.c
@@ -151,21 +151,17 @@ EXPORT_SYMBOL(ieee80211_stop_rx_ba_session);
  * After accepting the AddBA Request we activated a timer,
  * resetting it after each frame that arrives from the originator.
  */
-static void sta_rx_agg_session_timer_expired(unsigned long data)
+static void sta_rx_agg_session_timer_expired(struct timer_list *t)
 {
-   /* not an elegant detour, but there is no choice as the timer passes
-* only one argument, and various sta_info are needed here, so init
-* flow in sta_info_create gives the TID as data, while the timer_to_id
-* array gives the sta through container_of */
-   u8 *ptid = (u8 *)data;
-   u8 *timer_to_id = ptid - *ptid;
-   struct sta_info *sta = container_of(timer_to_id, struct sta_info,
-timer_to_tid[0]);
+   struct tid_ampdu_rx *tid_rx_timer =
+   from_timer(tid_rx_timer, t, session_timer);
+   struct sta_info *sta = tid_rx_timer->sta;
+   u16 tid = tid_rx_timer->tid;
struct tid_ampdu_rx *tid_rx;
unsigned long timeout;
 
rcu_read_lock();
-   tid_rx = rcu_dereference(sta->ampdu_mlme.tid_rx[*ptid]);
+   tid_rx = rcu_dereference(sta->ampdu_mlme.tid_rx[tid]);
if (!tid_rx) {
rcu_read_unlock();
return;
@@ -180,21 +176,18 @@ static void sta_rx_agg_session_timer_expired(unsigned 
long data)
rcu_read_unlock();
 
ht_dbg(sta->sdata, "RX session timer expired on %pM tid %d\n",
-  sta->sta.addr, (u16)*ptid);
+  sta->sta.addr, tid);
 
-   set_bit(*ptid, sta->ampdu_mlme.tid_rx_timer_expired);
+   set_bit(tid, sta->ampdu_mlme.tid_rx_timer_expired);
ieee80211_queue_work(>local->hw, >ampdu_mlme.work);
 }
 
-static void sta_rx_agg_reorder_timer_expired(unsigned long data)
+static void sta_rx_agg_reorder_timer_expired(struct timer_list *t)
 {
-   u8 *ptid = (u8 *)data;
-   u8 *timer_to_id = ptid - *ptid;
-   struct sta_info *sta = container_of(timer_to_id, struct sta_info,
-   timer_to_tid[0]);
+   struct tid_ampdu_rx *tid_rx = from_timer(tid_rx, t, reorder_timer);
 
rcu_read_lock();
-   ieee80211_release_reorder_timeout(sta, *ptid);
+   ieee80211_release_reorder_timeout(tid_rx->sta, tid_rx->tid);
rcu_read_unlock();
 }
 
@@ -356,14 +349,12 @@ void ___ieee80211_start_rx_ba_session(struct sta_info 
*sta,
spin_lock_init(_agg_rx->reorder_lock);
 
/* rx timer */
-   setup_deferrable_timer(_agg_rx->session_timer,
-  sta_rx_agg_session_timer_expired,
-  (unsigned long)>timer_to_tid[tid]);
+   timer_setup(_agg_rx->session_timer,
+   sta_rx_agg_session_timer_expired, TIMER_DEFERRABLE);
 
/* rx reorder timer */
-   setup_timer(_agg_rx->reorder_timer,
-   sta_rx_agg_reorder_timer_expired,
-   (unsigned long)>timer_to_tid[tid]);
+   timer_setup(_agg_rx->reorder_timer,
+   sta_rx_agg_reorder_timer_expired, 0);
 
/* prepare reordering buffer */
tid_agg_rx->reorder_buf =
@@ -399,6 +390,8 @@ void ___ieee80211_start_rx_ba_session(struct sta_info *sta,
tid_agg_rx->auto_seq = auto_seq;
tid_agg_rx->started = false;
tid_agg_rx->reorder_buf_filtered = 0;
+   tid_agg_rx->tid = tid;
+   tid_agg_rx->sta = sta;
status = WLAN_STATUS_SUCCESS;
 
/* activate it for RX */
diff --git a/net/mac80211/agg-tx.c b/net/mac80211/agg-tx.c
index bef516ec47f9..dedbb1fb10e7 100644
--- a/net/mac80211/agg-tx.c
+++ b/net/mac80211/agg-tx.c
@@ -422,15 +422,12 @@ int ___ieee80211_stop_tx_ba_session(struct sta_info *sta, 
u16 tid,
  * add Block Ack response will arrive from the recipient.
  * If this timer expires sta_addba_resp_timer_expired will be executed.
  */
-static void

Re: [patch net-next 01/20] net: sched: add block bind/unbind notif. and extended block_get/put

2017-10-17 Thread Duyck, Alexander H

On Tue, 2017-10-17 at 22:05 +0200, Jiri Pirko wrote:
> From: Jiri Pirko 
> 
> Introduce new type of ndo_setup_tc message to propage binding/unbinding
> of a block to driver. Call this ndo whenever qdisc gets/puts a block.
> Alongside with this, there's need to propagate binder type from qdisc
> code down to the notifier. So introduce extended variants of
> block_get/put in order to pass this info.
> 
> Signed-off-by: Jiri Pirko 
> ---
>  include/linux/netdevice.h |  1 +
>  include/net/pkt_cls.h | 40 +
>  net/sched/cls_api.c   | 56 
> ---
>  3 files changed, 94 insertions(+), 3 deletions(-)
> 
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 31bb301..062a4f5 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -771,6 +771,7 @@ typedef u16 (*select_queue_fallback_t)(struct net_device 
> *dev,
>  
>  enum tc_setup_type {
>   TC_SETUP_MQPRIO,
> + TC_SETUP_BLOCK,
>   TC_SETUP_CLSU32,
>   TC_SETUP_CLSFLOWER,
>   TC_SETUP_CLSMATCHALL,

I'm not a big fan of adding this to the middle of the enum. It will
make it harder for people that have to backport changes and such since
it is reordering values that are passed as a part of the kabi between
drivers and the kernel.

Also does this patch set really need to be 20 patches long? Seems like
you could have done this as a set of 8 and another of 12 since you need
about 8 patches to get to the point where you start pulling the code
out of the drivers.

- Alex

[PATCH v3 net-next] tcp: Remove use of daddr_cache in tracepoint

2017-10-17 Thread David Ahern

Running perf in one window to capture tcp_retransmit_skb tracepoint:
$ perf record -e tcp:tcp_retransmit_skb -a

And causing a retransmission on an active TCP session (e.g., dropping
packets in the receiver, changing MTU on the interface to 500 and back
to 1500) triggers a panic:

[   58.543144] BUG: unable to handle kernel NULL pointer dereference at 
0008
[   58.545300] IP: perf_trace_tcp_retransmit_skb+0xd0/0x145
[   58.546770] PGD 0 P4D 0
[   58.547472] Oops:  [#1] SMP
[   58.548328] Modules linked in: vrf
[   58.549262] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.0-rc4+ #26
[   58.551004] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.7.5-20140531_083030-gandalf 04/01/2014
[   58.554560] task: 81a0e540 task.stack: 81a0
[   58.555817] RIP: 0010:perf_trace_tcp_retransmit_skb+0xd0/0x145
[   58.557137] RSP: 0018:88003fc03d68 EFLAGS: 00010282
[   58.558292] RAX:  RBX: e8c0ec80 RCX: 880038543098
[   58.559850] RDX: 0400 RSI: 88003fc03d70 RDI: 88003fc14b68
[   58.561099] RBP: 88003fc03da8 R08:  R09: ead3224a
[   58.562005] R10: 88003fc03db8 R11: 0010 R12: 8800385428c0
[   58.562930] R13: e8c0e478 R14: 81a93a40 R15: 88003d4f0c00
[   58.563845] FS:  () GS:88003fc0() 
knlGS:
[   58.564873] CS:  0010 DS:  ES:  CR0: 80050033
[   58.565613] CR2: 0008 CR3: 3d68f004 CR4: 000606f0
[   58.566538] Call Trace:
[   58.566865]  
[   58.567140]  __tcp_retransmit_skb+0x4ab/0x4c6
[   58.567704]  ? tcp_set_ca_state+0x22/0x3f
[   58.568231]  tcp_retransmit_skb+0x14/0xa3
[   58.568754]  tcp_retransmit_timer+0x472/0x5e3
[   58.569324]  ? tcp_write_timer_handler+0x1e9/0x1e9
[   58.569946]  tcp_write_timer_handler+0x95/0x1e9
[   58.570548]  tcp_write_timer+0x2a/0x58

Remove use of ipv6_pinfo in favor of data in sock_common.

Fixes: e086101b150a ("tcp: add a tracepoint for tcp retransmission")
Signed-off-by: David Ahern 
---
v3
- remove use of inet6_sk and check sk_family (requested by Eric)
- Add IS_ENABLED(CONFIG_IPV6) around use of sk_v6_rcv_saddr and
  sk_v6_daddr as done in sock_common (noted by Cong)

v2
- remove np and get addresses from sock_common

 include/trace/events/tcp.h | 12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/include/trace/events/tcp.h b/include/trace/events/tcp.h
index 3d1cbd072b7e..271812216ce3 100644
--- a/include/trace/events/tcp.h
+++ b/include/trace/events/tcp.h
@@ -27,7 +27,6 @@ TRACE_EVENT(tcp_retransmit_skb,
),
 
TP_fast_assign(
-   struct ipv6_pinfo *np = inet6_sk(sk);
struct inet_sock *inet = inet_sk(sk);
struct in6_addr *pin6;
__be32 *p32;
@@ -44,12 +43,15 @@ TRACE_EVENT(tcp_retransmit_skb,
p32 = (__be32 *) __entry->daddr;
*p32 =  inet->inet_daddr;
 
-   if (np) {
+#if IS_ENABLED(CONFIG_IPV6)
+   if (sk->sk_family == AF_INET6) {
pin6 = (struct in6_addr *)__entry->saddr_v6;
-   *pin6 = np->saddr;
+   *pin6 = sk->sk_v6_rcv_saddr;
pin6 = (struct in6_addr *)__entry->daddr_v6;
-   *pin6 = *(np->daddr_cache);
-   } else {
+   *pin6 = sk->sk_v6_daddr;
+   } else
+#endif
+   {
pin6 = (struct in6_addr *)__entry->saddr_v6;
ipv6_addr_set_v4mapped(inet->inet_saddr, pin6);
pin6 = (struct in6_addr *)__entry->daddr_v6;
-- 
2.1.4

[patch net-next 01/20] net: sched: add block bind/unbind notif. and extended block_get/put

2017-10-17 Thread Jiri Pirko

From: Jiri Pirko 

Introduce new type of ndo_setup_tc message to propage binding/unbinding
of a block to driver. Call this ndo whenever qdisc gets/puts a block.
Alongside with this, there's need to propagate binder type from qdisc
code down to the notifier. So introduce extended variants of
block_get/put in order to pass this info.

Signed-off-by: Jiri Pirko 
---
 include/linux/netdevice.h |  1 +
 include/net/pkt_cls.h | 40 +
 net/sched/cls_api.c   | 56 ---
 3 files changed, 94 insertions(+), 3 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 31bb301..062a4f5 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -771,6 +771,7 @@ typedef u16 (*select_queue_fallback_t)(struct net_device 
*dev,
 
 enum tc_setup_type {
TC_SETUP_MQPRIO,
+   TC_SETUP_BLOCK,
TC_SETUP_CLSU32,
TC_SETUP_CLSFLOWER,
TC_SETUP_CLSMATCHALL,
diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index 49a143e..41bc7d7 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -17,13 +17,27 @@ struct tcf_walker {
 int register_tcf_proto_ops(struct tcf_proto_ops *ops);
 int unregister_tcf_proto_ops(struct tcf_proto_ops *ops);
 
+enum tcf_block_binder_type {
+   TCF_BLOCK_BINDER_TYPE_UNSPEC,
+};
+
+struct tcf_block_ext_info {
+   enum tcf_block_binder_type binder_type;
+};
+
 #ifdef CONFIG_NET_CLS
 struct tcf_chain *tcf_chain_get(struct tcf_block *block, u32 chain_index,
bool create);
 void tcf_chain_put(struct tcf_chain *chain);
 int tcf_block_get(struct tcf_block **p_block,
  struct tcf_proto __rcu **p_filter_chain, struct Qdisc *q);
+int tcf_block_get_ext(struct tcf_block **p_block,
+ struct tcf_proto __rcu **p_filter_chain, struct Qdisc *q,
+ struct tcf_block_ext_info *ei);
 void tcf_block_put(struct tcf_block *block);
+void tcf_block_put_ext(struct tcf_block *block,
+  struct tcf_proto __rcu **p_filter_chain, struct Qdisc *q,
+  struct tcf_block_ext_info *ei);
 
 static inline struct Qdisc *tcf_block_q(struct tcf_block *block)
 {
@@ -46,10 +60,25 @@ int tcf_block_get(struct tcf_block **p_block,
return 0;
 }
 
+static inline
+int tcf_block_get_ext(struct tcf_block **p_block,
+ struct tcf_proto __rcu **p_filter_chain, struct Qdisc *q,
+ struct tcf_block_ext_info *ei)
+{
+   return 0;
+}
+
 static inline void tcf_block_put(struct tcf_block *block)
 {
 }
 
+static inline
+void tcf_block_put_ext(struct tcf_block *block,
+  struct tcf_proto __rcu **p_filter_chain, struct Qdisc *q,
+  struct tcf_block_ext_info *ei)
+{
+}
+
 static inline struct Qdisc *tcf_block_q(struct tcf_block *block)
 {
return NULL;
@@ -434,6 +463,17 @@ tcf_match_indev(struct sk_buff *skb, int ifindex)
 int tc_setup_cb_call(struct tcf_exts *exts, enum tc_setup_type type,
 void *type_data, bool err_stop);
 
+enum tc_block_command {
+   TC_BLOCK_BIND,
+   TC_BLOCK_UNBIND,
+};
+
+struct tc_block_offload {
+   enum tc_block_command command;
+   enum tcf_block_binder_type binder_type;
+   struct tcf_block *block;
+};
+
 struct tc_cls_common_offload {
u32 chain_index;
__be16 protocol;
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 2e8e87f..92dce26 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -240,8 +240,36 @@ tcf_chain_filter_chain_ptr_set(struct tcf_chain *chain,
chain->p_filter_chain = p_filter_chain;
 }
 
-int tcf_block_get(struct tcf_block **p_block,
- struct tcf_proto __rcu **p_filter_chain, struct Qdisc *q)
+static void tcf_block_offload_cmd(struct tcf_block *block, struct Qdisc *q,
+ struct tcf_block_ext_info *ei,
+ enum tc_block_command command)
+{
+   struct net_device *dev = q->dev_queue->dev;
+   struct tc_block_offload bo = {};
+
+   if (!tc_can_offload(dev))
+   return;
+   bo.command = command;
+   bo.binder_type = ei->binder_type;
+   bo.block = block;
+   dev->netdev_ops->ndo_setup_tc(dev, TC_SETUP_BLOCK, );
+}
+
+static void tcf_block_offload_bind(struct tcf_block *block, struct Qdisc *q,
+  struct tcf_block_ext_info *ei)
+{
+   tcf_block_offload_cmd(block, q, ei, TC_BLOCK_BIND);
+}
+
+static void tcf_block_offload_unbind(struct tcf_block *block, struct Qdisc *q,
+struct tcf_block_ext_info *ei)
+{
+   tcf_block_offload_cmd(block, q, ei, TC_BLOCK_UNBIND);
+}
+
+int tcf_block_get_ext(struct tcf_block **p_block,
+ struct tcf_proto __rcu **p_filter_chain, struct Qdisc *q,
+ struct

[patch net-next 02/20] net: sched: use extended variants of block_get/put in ingress and clsact qdiscs

2017-10-17 Thread Jiri Pirko

From: Jiri Pirko 

Use previously introduced extended variants of block get and put
functions. This allows to specify a binder types specific to clsact
ingress/egress which is useful for drivers to distinguish who actually
got the block.

Signed-off-by: Jiri Pirko 
---
 include/net/pkt_cls.h   |  2 ++
 net/sched/sch_ingress.c | 36 +---
 2 files changed, 31 insertions(+), 7 deletions(-)

diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index 41bc7d7..5c50af8 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -19,6 +19,8 @@ int unregister_tcf_proto_ops(struct tcf_proto_ops *ops);
 
 enum tcf_block_binder_type {
TCF_BLOCK_BINDER_TYPE_UNSPEC,
+   TCF_BLOCK_BINDER_TYPE_CLSACT_INGRESS,
+   TCF_BLOCK_BINDER_TYPE_CLSACT_EGRESS,
 };
 
 struct tcf_block_ext_info {
diff --git a/net/sched/sch_ingress.c b/net/sched/sch_ingress.c
index 9ccc1b8..b599db2 100644
--- a/net/sched/sch_ingress.c
+++ b/net/sched/sch_ingress.c
@@ -20,6 +20,7 @@
 
 struct ingress_sched_data {
struct tcf_block *block;
+   struct tcf_block_ext_info block_info;
 };
 
 static struct Qdisc *ingress_leaf(struct Qdisc *sch, unsigned long arg)
@@ -59,7 +60,10 @@ static int ingress_init(struct Qdisc *sch, struct nlattr 
*opt)
struct net_device *dev = qdisc_dev(sch);
int err;
 
-   err = tcf_block_get(>block, >ingress_cl_list, sch);
+   q->block_info.binder_type = TCF_BLOCK_BINDER_TYPE_CLSACT_INGRESS;
+
+   err = tcf_block_get_ext(>block, >ingress_cl_list,
+   sch, >block_info);
if (err)
return err;
 
@@ -72,8 +76,10 @@ static int ingress_init(struct Qdisc *sch, struct nlattr 
*opt)
 static void ingress_destroy(struct Qdisc *sch)
 {
struct ingress_sched_data *q = qdisc_priv(sch);
+   struct net_device *dev = qdisc_dev(sch);
 
-   tcf_block_put(q->block);
+   tcf_block_put_ext(q->block, >ingress_cl_list,
+ sch, >block_info);
net_dec_ingress_queue();
 }
 
@@ -114,6 +120,8 @@ static struct Qdisc_ops ingress_qdisc_ops __read_mostly = {
 struct clsact_sched_data {
struct tcf_block *ingress_block;
struct tcf_block *egress_block;
+   struct tcf_block_ext_info ingress_block_info;
+   struct tcf_block_ext_info egress_block_info;
 };
 
 static unsigned long clsact_find(struct Qdisc *sch, u32 classid)
@@ -153,13 +161,19 @@ static int clsact_init(struct Qdisc *sch, struct nlattr 
*opt)
struct net_device *dev = qdisc_dev(sch);
int err;
 
-   err = tcf_block_get(>ingress_block, >ingress_cl_list, sch);
+   q->ingress_block_info.binder_type = 
TCF_BLOCK_BINDER_TYPE_CLSACT_INGRESS;
+
+   err = tcf_block_get_ext(>ingress_block, >ingress_cl_list,
+   sch, >ingress_block_info);
if (err)
return err;
 
-   err = tcf_block_get(>egress_block, >egress_cl_list, sch);
+   q->egress_block_info.binder_type = TCF_BLOCK_BINDER_TYPE_CLSACT_EGRESS;
+
+   err = tcf_block_get_ext(>egress_block, >egress_cl_list,
+   sch, >egress_block_info);
if (err)
-   return err;
+   goto err_egress_block_get;
 
net_inc_ingress_queue();
net_inc_egress_queue();
@@ -167,14 +181,22 @@ static int clsact_init(struct Qdisc *sch, struct nlattr 
*opt)
sch->flags |= TCQ_F_CPUSTATS;
 
return 0;
+
+err_egress_block_get:
+   tcf_block_put_ext(q->ingress_block, >ingress_cl_list,
+ sch, >ingress_block_info);
+   return err;
 }
 
 static void clsact_destroy(struct Qdisc *sch)
 {
struct clsact_sched_data *q = qdisc_priv(sch);
+   struct net_device *dev = qdisc_dev(sch);
 
-   tcf_block_put(q->egress_block);
-   tcf_block_put(q->ingress_block);
+   tcf_block_put_ext(q->egress_block, >egress_cl_list,
+ sch, >egress_block_info);
+   tcf_block_put_ext(q->ingress_block, >ingress_cl_list,
+ sch, >ingress_block_info);
 
net_dec_ingress_queue();
net_dec_egress_queue();
-- 
2.9.5

[patch net-next 04/20] net: sched: use tc_setup_cb_call to call per-block callbacks

2017-10-17 Thread Jiri Pirko

From: Jiri Pirko 

Extend the tc_setup_cb_call entrypoint function originally used only for
action egress devices callbacks to call per-block callbacks as well.

Signed-off-by: Jiri Pirko 
---
 include/net/pkt_cls.h  |  4 ++--
 net/sched/cls_api.c| 21 ++---
 net/sched/cls_flower.c |  9 ++---
 3 files changed, 26 insertions(+), 8 deletions(-)

diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index 4bc6b1c..fcca5a9 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -543,8 +543,8 @@ tcf_match_indev(struct sk_buff *skb, int ifindex)
 }
 #endif /* CONFIG_NET_CLS_IND */
 
-int tc_setup_cb_call(struct tcf_exts *exts, enum tc_setup_type type,
-void *type_data, bool err_stop);
+int tc_setup_cb_call(struct tcf_block *block, struct tcf_exts *exts,
+enum tc_setup_type type, void *type_data, bool err_stop);
 
 enum tc_block_command {
TC_BLOCK_BIND,
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index b16c79c..cdfdc24 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -1206,10 +1206,25 @@ static int tc_exts_setup_cb_egdev_call(struct tcf_exts 
*exts,
return ok_count;
 }
 
-int tc_setup_cb_call(struct tcf_exts *exts, enum tc_setup_type type,
-void *type_data, bool err_stop)
+int tc_setup_cb_call(struct tcf_block *block, struct tcf_exts *exts,
+enum tc_setup_type type, void *type_data, bool err_stop)
 {
-   return tc_exts_setup_cb_egdev_call(exts, type, type_data, err_stop);
+   int ok_count;
+   int ret;
+
+   ret = tcf_block_cb_call(block, type, type_data, err_stop);
+   if (ret < 0)
+   return ret;
+   ok_count = ret;
+
+   if (!exts)
+   return ok_count;
+   ret = tc_exts_setup_cb_egdev_call(exts, type, type_data, err_stop);
+   if (ret < 0)
+   return ret;
+   ok_count += ret;
+
+   return ok_count;
 }
 EXPORT_SYMBOL(tc_setup_cb_call);
 
diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index 5b7bb96..76b4e0a 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -201,6 +201,7 @@ static void fl_hw_destroy_filter(struct tcf_proto *tp, 
struct cls_fl_filter *f)
 {
struct tc_cls_flower_offload cls_flower = {};
struct net_device *dev = tp->q->dev_queue->dev;
+   struct tcf_block *block = tp->chain->block;
 
tc_cls_common_offload_init(_flower.common, tp);
cls_flower.command = TC_CLSFLOWER_DESTROY;
@@ -209,7 +210,7 @@ static void fl_hw_destroy_filter(struct tcf_proto *tp, 
struct cls_fl_filter *f)
if (tc_can_offload(dev))
dev->netdev_ops->ndo_setup_tc(dev, TC_SETUP_CLSFLOWER,
  _flower);
-   tc_setup_cb_call(>exts, TC_SETUP_CLSFLOWER,
+   tc_setup_cb_call(block, >exts, TC_SETUP_CLSFLOWER,
 _flower, false);
 }
 
@@ -220,6 +221,7 @@ static int fl_hw_replace_filter(struct tcf_proto *tp,
 {
struct net_device *dev = tp->q->dev_queue->dev;
struct tc_cls_flower_offload cls_flower = {};
+   struct tcf_block *block = tp->chain->block;
bool skip_sw = tc_skip_sw(f->flags);
int err;
 
@@ -242,7 +244,7 @@ static int fl_hw_replace_filter(struct tcf_proto *tp,
}
}
 
-   err = tc_setup_cb_call(>exts, TC_SETUP_CLSFLOWER,
+   err = tc_setup_cb_call(block, >exts, TC_SETUP_CLSFLOWER,
   _flower, skip_sw);
if (err < 0) {
fl_hw_destroy_filter(tp, f);
@@ -261,6 +263,7 @@ static void fl_hw_update_stats(struct tcf_proto *tp, struct 
cls_fl_filter *f)
 {
struct tc_cls_flower_offload cls_flower = {};
struct net_device *dev = tp->q->dev_queue->dev;
+   struct tcf_block *block = tp->chain->block;
 
tc_cls_common_offload_init(_flower.common, tp);
cls_flower.command = TC_CLSFLOWER_STATS;
@@ -270,7 +273,7 @@ static void fl_hw_update_stats(struct tcf_proto *tp, struct 
cls_fl_filter *f)
if (tc_can_offload(dev))
dev->netdev_ops->ndo_setup_tc(dev, TC_SETUP_CLSFLOWER,
  _flower);
-   tc_setup_cb_call(>exts, TC_SETUP_CLSFLOWER,
+   tc_setup_cb_call(block, >exts, TC_SETUP_CLSFLOWER,
 _flower, false);
 }
 
-- 
2.9.5

[patch net-next 09/20] mlxsw: spectrum: Convert ndo_setup_tc offloads to block callbacks

2017-10-17 Thread Jiri Pirko

From: Jiri Pirko 

Benefit from the newly introduced block callback infrastructure and
convert ndo_setup_tc calls for matchall and flower offloads to block
callbacks.

Signed-off-by: Jiri Pirko 
---
 drivers/net/ethernet/mellanox/mlxsw/spectrum.c | 82 +++---
 1 file changed, 60 insertions(+), 22 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c 
b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
index e1e11c7..08e321a 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
@@ -1697,17 +1697,9 @@ static void mlxsw_sp_port_del_cls_matchall(struct 
mlxsw_sp_port *mlxsw_sp_port,
 }
 
 static int mlxsw_sp_setup_tc_cls_matchall(struct mlxsw_sp_port *mlxsw_sp_port,
- struct tc_cls_matchall_offload *f)
+ struct tc_cls_matchall_offload *f,
+ bool ingress)
 {
-   bool ingress;
-
-   if (is_classid_clsact_ingress(f->common.classid))
-   ingress = true;
-   else if (is_classid_clsact_egress(f->common.classid))
-   ingress = false;
-   else
-   return -EOPNOTSUPP;
-
if (f->common.chain_index)
return -EOPNOTSUPP;
 
@@ -1725,17 +1717,9 @@ static int mlxsw_sp_setup_tc_cls_matchall(struct 
mlxsw_sp_port *mlxsw_sp_port,
 
 static int
 mlxsw_sp_setup_tc_cls_flower(struct mlxsw_sp_port *mlxsw_sp_port,
-struct tc_cls_flower_offload *f)
+struct tc_cls_flower_offload *f,
+bool ingress)
 {
-   bool ingress;
-
-   if (is_classid_clsact_ingress(f->common.classid))
-   ingress = true;
-   else if (is_classid_clsact_egress(f->common.classid))
-   ingress = false;
-   else
-   return -EOPNOTSUPP;
-
switch (f->command) {
case TC_CLSFLOWER_REPLACE:
return mlxsw_sp_flower_replace(mlxsw_sp_port, ingress, f);
@@ -1749,6 +1733,59 @@ mlxsw_sp_setup_tc_cls_flower(struct mlxsw_sp_port 
*mlxsw_sp_port,
}
 }
 
+static int mlxsw_sp_setup_tc_block_cb(enum tc_setup_type type, void *type_data,
+ void *cb_priv, bool ingress)
+{
+   struct mlxsw_sp_port *mlxsw_sp_port = cb_priv;
+
+   switch (type) {
+   case TC_SETUP_CLSMATCHALL:
+   return mlxsw_sp_setup_tc_cls_matchall(mlxsw_sp_port, type_data,
+ ingress);
+   case TC_SETUP_CLSFLOWER:
+   return mlxsw_sp_setup_tc_cls_flower(mlxsw_sp_port, type_data,
+   ingress);
+   default:
+   return -EOPNOTSUPP;
+   }
+}
+
+static int mlxsw_sp_setup_tc_block_cb_ig(enum tc_setup_type type,
+void *type_data, void *cb_priv)
+{
+   return mlxsw_sp_setup_tc_block_cb(type, type_data, cb_priv, true);
+}
+
+static int mlxsw_sp_setup_tc_block_cb_eg(enum tc_setup_type type,
+void *type_data, void *cb_priv)
+{
+   return mlxsw_sp_setup_tc_block_cb(type, type_data, cb_priv, false);
+}
+
+static int mlxsw_sp_setup_tc_block(struct mlxsw_sp_port *mlxsw_sp_port,
+  struct tc_block_offload *f)
+{
+   tc_setup_cb_t *cb;
+
+   if (f->binder_type == TCF_BLOCK_BINDER_TYPE_CLSACT_INGRESS)
+   cb = mlxsw_sp_setup_tc_block_cb_ig;
+   else if (f->binder_type == TCF_BLOCK_BINDER_TYPE_CLSACT_EGRESS)
+   cb = mlxsw_sp_setup_tc_block_cb_eg;
+   else
+   return -EOPNOTSUPP;
+
+   switch (f->command) {
+   case TC_BLOCK_BIND:
+   return tcf_block_cb_register(f->block, cb, mlxsw_sp_port,
+mlxsw_sp_port);
+   case TC_BLOCK_UNBIND:
+   tcf_block_cb_unregister(f->block, cb, mlxsw_sp_port);
+   return 0;
+   default:
+   return -EOPNOTSUPP;
+   }
+}
+
 static int mlxsw_sp_setup_tc(struct net_device *dev, enum tc_setup_type type,
 void *type_data)
 {
@@ -1756,9 +1793,10 @@ static int mlxsw_sp_setup_tc(struct net_device *dev, 
enum tc_setup_type type,
 
switch (type) {
case TC_SETUP_CLSMATCHALL:
-   return mlxsw_sp_setup_tc_cls_matchall(mlxsw_sp_port, type_data);
case TC_SETUP_CLSFLOWER:
-   return mlxsw_sp_setup_tc_cls_flower(mlxsw_sp_port, type_data);
+   return 0; /* will be removed after conversion from ndo */
+   case TC_SETUP_BLOCK:
+   return mlxsw_sp_setup_tc_block(mlxsw_sp_port, type_data);
default:
return -EOPNOTSUPP;
}
-- 
2.9.5

[patch net-next 03/20] net: sched: introduce per-block callbacks

2017-10-17 Thread Jiri Pirko

From: Jiri Pirko 

Introduce infrastructure that allows drivers to register callbacks that
are called whenever tc would offload inserted rule for a specific block.

Signed-off-by: Jiri Pirko 
---
 include/net/pkt_cls.h |  81 +++
 include/net/sch_generic.h |   1 +
 net/sched/cls_api.c   | 105 ++
 3 files changed, 187 insertions(+)

diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index 5c50af8..4bc6b1c 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -27,6 +27,8 @@ struct tcf_block_ext_info {
enum tcf_block_binder_type binder_type;
 };
 
+struct tcf_block_cb;
+
 #ifdef CONFIG_NET_CLS
 struct tcf_chain *tcf_chain_get(struct tcf_block *block, u32 chain_index,
bool create);
@@ -51,6 +53,21 @@ static inline struct net_device *tcf_block_dev(struct 
tcf_block *block)
return tcf_block_q(block)->dev_queue->dev;
 }
 
+void *tcf_block_cb_priv(struct tcf_block_cb *block_cb);
+struct tcf_block_cb *tcf_block_cb_lookup(struct tcf_block *block,
+tc_setup_cb_t *cb, void *cb_ident);
+void tcf_block_cb_incref(struct tcf_block_cb *block_cb);
+unsigned int tcf_block_cb_decref(struct tcf_block_cb *block_cb);
+struct tcf_block_cb *__tcf_block_cb_register(struct tcf_block *block,
+tc_setup_cb_t *cb, void *cb_ident,
+void *cb_priv);
+int tcf_block_cb_register(struct tcf_block *block,
+ tc_setup_cb_t *cb, void *cb_ident,
+ void *cb_priv);
+void __tcf_block_cb_unregister(struct tcf_block_cb *block_cb);
+void tcf_block_cb_unregister(struct tcf_block *block,
+tc_setup_cb_t *cb, void *cb_ident);
+
 int tcf_classify(struct sk_buff *skb, const struct tcf_proto *tp,
 struct tcf_result *res, bool compat_mode);
 
@@ -91,6 +108,70 @@ static inline struct net_device *tcf_block_dev(struct 
tcf_block *block)
return NULL;
 }
 
+static inline
+int tc_setup_cb_block_register(struct tcf_block *block, tc_setup_cb_t *cb,
+  void *cb_priv)
+{
+   return 0;
+}
+
+static inline
+void tc_setup_cb_block_unregister(struct tcf_block *block, tc_setup_cb_t *cb,
+ void *cb_priv)
+{
+}
+
+static inline
+void *tcf_block_cb_priv(struct tcf_block_cb *block_cb)
+{
+   return NULL;
+}
+
+static inline
+struct tcf_block_cb *tcf_block_cb_lookup(struct tcf_block *block,
+tc_setup_cb_t *cb, void *cb_ident)
+{
+   return NULL;
+}
+
+static inline
+void tcf_block_cb_incref(struct tcf_block_cb *block_cb)
+{
+}
+
+static inline
+unsigned int tcf_block_cb_decref(struct tcf_block_cb *block_cb)
+{
+   return 0;
+}
+
+static inline
+struct tcf_block_cb *__tcf_block_cb_register(struct tcf_block *block,
+tc_setup_cb_t *cb, void *cb_ident,
+void *cb_priv)
+{
+   return NULL;
+}
+
+static inline
+int tcf_block_cb_register(struct tcf_block *block,
+ tc_setup_cb_t *cb, void *cb_ident,
+ void *cb_priv)
+{
+   return 0;
+}
+
+static inline
+void __tcf_block_cb_unregister(struct tcf_block_cb *block_cb)
+{
+}
+
+static inline
+void tcf_block_cb_unregister(struct tcf_block *block,
+tc_setup_cb_t *cb, void *cb_ident)
+{
+}
+
 static inline int tcf_classify(struct sk_buff *skb, const struct tcf_proto *tp,
   struct tcf_result *res, bool compat_mode)
 {
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 0aea9e2..031dffd 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -272,6 +272,7 @@ struct tcf_block {
struct list_head chain_list;
struct net *net;
struct Qdisc *q;
+   struct list_head cb_list;
 };
 
 static inline void qdisc_cb_private_validate(const struct sk_buff *skb, int sz)
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 92dce26..b16c79c 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -278,6 +278,8 @@ int tcf_block_get_ext(struct tcf_block **p_block,
if (!block)
return -ENOMEM;
INIT_LIST_HEAD(>chain_list);
+   INIT_LIST_HEAD(>cb_list);
+
/* Create chain 0 by default, it has to be always present. */
chain = tcf_chain_create(block, 0);
if (!chain) {
@@ -354,6 +356,109 @@ void tcf_block_put(struct tcf_block *block)
 }
 EXPORT_SYMBOL(tcf_block_put);
 
+struct tcf_block_cb {
+   struct list_head list;
+   tc_setup_cb_t *cb;
+   void *cb_ident;
+   void *cb_priv;
+   unsigned int refcnt;
+};
+
+void *tcf_block_cb_priv(struct tcf_block_cb *block_cb)
+{
+   return block_cb->cb_priv;

[patch net-next 06/20] net: sched: cls_u32: swap u32_remove_hw_knode and u32_remove_hw_hnode

2017-10-17 Thread Jiri Pirko

From: Jiri Pirko 

Signed-off-by: Jiri Pirko 
---
 net/sched/cls_u32.c | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c
index b6d4606..f407f13 100644
--- a/net/sched/cls_u32.c
+++ b/net/sched/cls_u32.c
@@ -462,7 +462,7 @@ static int u32_delete_key(struct tcf_proto *tp, struct 
tc_u_knode *key)
return 0;
 }
 
-static void u32_remove_hw_knode(struct tcf_proto *tp, u32 handle)
+static void u32_clear_hw_hnode(struct tcf_proto *tp, struct tc_u_hnode *h)
 {
struct net_device *dev = tp->q->dev_queue->dev;
struct tc_cls_u32_offload cls_u32 = {};
@@ -471,8 +471,10 @@ static void u32_remove_hw_knode(struct tcf_proto *tp, u32 
handle)
return;
 
tc_cls_common_offload_init(_u32.common, tp);
-   cls_u32.command = TC_CLSU32_DELETE_KNODE;
-   cls_u32.knode.handle = handle;
+   cls_u32.command = TC_CLSU32_DELETE_HNODE;
+   cls_u32.hnode.divisor = h->divisor;
+   cls_u32.hnode.handle = h->handle;
+   cls_u32.hnode.prio = h->prio;
 
dev->netdev_ops->ndo_setup_tc(dev, TC_SETUP_CLSU32, _u32);
 }
@@ -500,7 +502,7 @@ static int u32_replace_hw_hnode(struct tcf_proto *tp, 
struct tc_u_hnode *h,
return 0;
 }
 
-static void u32_clear_hw_hnode(struct tcf_proto *tp, struct tc_u_hnode *h)
+static void u32_remove_hw_knode(struct tcf_proto *tp, u32 handle)
 {
struct net_device *dev = tp->q->dev_queue->dev;
struct tc_cls_u32_offload cls_u32 = {};
@@ -509,10 +511,8 @@ static void u32_clear_hw_hnode(struct tcf_proto *tp, 
struct tc_u_hnode *h)
return;
 
tc_cls_common_offload_init(_u32.common, tp);
-   cls_u32.command = TC_CLSU32_DELETE_HNODE;
-   cls_u32.hnode.divisor = h->divisor;
-   cls_u32.hnode.handle = h->handle;
-   cls_u32.hnode.prio = h->prio;
+   cls_u32.command = TC_CLSU32_DELETE_KNODE;
+   cls_u32.knode.handle = handle;
 
dev->netdev_ops->ndo_setup_tc(dev, TC_SETUP_CLSU32, _u32);
 }
-- 
2.9.5

[patch net-next 05/20] net: sched: cls_matchall: call block callbacks for offload

2017-10-17 Thread Jiri Pirko

From: Jiri Pirko 

Use the newly introduced callbacks infrastructure and call block
callbacks alongside with the existing per-netdev ndo_setup_tc.

Signed-off-by: Jiri Pirko 
---
 net/sched/cls_matchall.c | 72 ++--
 1 file changed, 45 insertions(+), 27 deletions(-)

diff --git a/net/sched/cls_matchall.c b/net/sched/cls_matchall.c
index eeac606..5278534 100644
--- a/net/sched/cls_matchall.c
+++ b/net/sched/cls_matchall.c
@@ -50,50 +50,73 @@ static void mall_destroy_rcu(struct rcu_head *rcu)
kfree(head);
 }
 
-static int mall_replace_hw_filter(struct tcf_proto *tp,
- struct cls_mall_head *head,
- unsigned long cookie)
+static void mall_destroy_hw_filter(struct tcf_proto *tp,
+  struct cls_mall_head *head,
+  unsigned long cookie)
 {
struct net_device *dev = tp->q->dev_queue->dev;
struct tc_cls_matchall_offload cls_mall = {};
-   int err;
+   struct tcf_block *block = tp->chain->block;
 
tc_cls_common_offload_init(_mall.common, tp);
-   cls_mall.command = TC_CLSMATCHALL_REPLACE;
-   cls_mall.exts = >exts;
+   cls_mall.command = TC_CLSMATCHALL_DESTROY;
cls_mall.cookie = cookie;
 
-   err = dev->netdev_ops->ndo_setup_tc(dev, TC_SETUP_CLSMATCHALL,
-   _mall);
-   if (!err)
-   head->flags |= TCA_CLS_FLAGS_IN_HW;
-
-   return err;
+   if (tc_can_offload(dev))
+   dev->netdev_ops->ndo_setup_tc(dev, TC_SETUP_CLSMATCHALL,
+ _mall);
+   tc_setup_cb_call(block, NULL, TC_SETUP_CLSMATCHALL, _mall, false);
 }
 
-static void mall_destroy_hw_filter(struct tcf_proto *tp,
-  struct cls_mall_head *head,
-  unsigned long cookie)
+static int mall_replace_hw_filter(struct tcf_proto *tp,
+ struct cls_mall_head *head,
+ unsigned long cookie)
 {
struct net_device *dev = tp->q->dev_queue->dev;
struct tc_cls_matchall_offload cls_mall = {};
+   struct tcf_block *block = tp->chain->block;
+   bool skip_sw = tc_skip_sw(head->flags);
+   int err;
 
tc_cls_common_offload_init(_mall.common, tp);
-   cls_mall.command = TC_CLSMATCHALL_DESTROY;
+   cls_mall.command = TC_CLSMATCHALL_REPLACE;
+   cls_mall.exts = >exts;
cls_mall.cookie = cookie;
 
-   dev->netdev_ops->ndo_setup_tc(dev, TC_SETUP_CLSMATCHALL, _mall);
+   if (tc_can_offload(dev)) {
+   err = dev->netdev_ops->ndo_setup_tc(dev, TC_SETUP_CLSMATCHALL,
+   _mall);
+   if (err) {
+   if (skip_sw)
+   return err;
+   } else {
+   head->flags |= TCA_CLS_FLAGS_IN_HW;
+   }
+   }
+
+   err = tc_setup_cb_call(block, NULL, TC_SETUP_CLSMATCHALL,
+  _mall, skip_sw);
+   if (err < 0) {
+   mall_destroy_hw_filter(tp, head, cookie);
+   return err;
+   } else if (err > 0) {
+   head->flags |= TCA_CLS_FLAGS_IN_HW;
+   }
+
+   if (skip_sw && !(head->flags & TCA_CLS_FLAGS_IN_HW))
+   return -EINVAL;
+
+   return 0;
 }
 
 static void mall_destroy(struct tcf_proto *tp)
 {
struct cls_mall_head *head = rtnl_dereference(tp->root);
-   struct net_device *dev = tp->q->dev_queue->dev;
 
if (!head)
return;
 
-   if (tc_should_offload(dev, head->flags))
+   if (!tc_skip_hw(head->flags))
mall_destroy_hw_filter(tp, head, (unsigned long) head);
 
call_rcu(>rcu, mall_destroy_rcu);
@@ -133,7 +156,6 @@ static int mall_change(struct net *net, struct sk_buff 
*in_skb,
   void **arg, bool ovr)
 {
struct cls_mall_head *head = rtnl_dereference(tp->root);
-   struct net_device *dev = tp->q->dev_queue->dev;
struct nlattr *tb[TCA_MATCHALL_MAX + 1];
struct cls_mall_head *new;
u32 flags = 0;
@@ -173,14 +195,10 @@ static int mall_change(struct net *net, struct sk_buff 
*in_skb,
if (err)
goto err_set_parms;
 
-   if (tc_should_offload(dev, flags)) {
+   if (!tc_skip_hw(new->flags)) {
err = mall_replace_hw_filter(tp, new, (unsigned long) new);
-   if (err) {
-   if (tc_skip_sw(flags))
-   goto err_replace_hw_filter;
-   else
-   err = 0;
-   }
+   if (err)
+   goto err_replace_hw_filter;
}
 
if (!tc_in_hw(new->flags))
-- 
2.9.5

[patch net-next 10/20] mlx5e: Convert ndo_setup_tc offloads to block callbacks

2017-10-17 Thread Jiri Pirko

From: Jiri Pirko 

Benefit from the newly introduced block callback infrastructure and
convert ndo_setup_tc calls for flower offloads to block callbacks.

Signed-off-by: Jiri Pirko 
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h  |  4 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 45 ---
 drivers/net/ethernet/mellanox/mlx5/core/en_rep.c  | 24 +---
 3 files changed, 51 insertions(+), 22 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index ca8845b..e613ce0 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -1056,8 +1056,8 @@ int mlx5e_ethtool_get_ts_info(struct mlx5e_priv *priv,
 int mlx5e_ethtool_flash_device(struct mlx5e_priv *priv,
   struct ethtool_flash *flash);
 
-int mlx5e_setup_tc(struct net_device *dev, enum tc_setup_type type,
-  void *type_data);
+int mlx5e_setup_tc_block_cb(enum tc_setup_type type, void *type_data,
+   void *cb_priv);
 
 /* mlx5e generic netdev management API */
 struct net_device*
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 3a1969a..e810868 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -3083,13 +3083,10 @@ static int mlx5e_setup_tc_mqprio(struct net_device 
*netdev,
 }
 
 #ifdef CONFIG_MLX5_ESWITCH
-static int mlx5e_setup_tc_cls_flower(struct net_device *dev,
+static int mlx5e_setup_tc_cls_flower(struct mlx5e_priv *priv,
 struct tc_cls_flower_offload *cls_flower)
 {
-   struct mlx5e_priv *priv = netdev_priv(dev);
-
-   if (!is_classid_clsact_ingress(cls_flower->common.classid) ||
-   cls_flower->common.chain_index)
+   if (cls_flower->common.chain_index)
return -EOPNOTSUPP;
 
switch (cls_flower->command) {
@@ -3103,6 +3100,40 @@ static int mlx5e_setup_tc_cls_flower(struct net_device 
*dev,
return -EOPNOTSUPP;
}
 }
+
+int mlx5e_setup_tc_block_cb(enum tc_setup_type type, void *type_data,
+   void *cb_priv)
+{
+   struct mlx5e_priv *priv = cb_priv;
+
+   switch (type) {
+   case TC_SETUP_CLSFLOWER:
+   return mlx5e_setup_tc_cls_flower(priv, type_data);
+   default:
+   return -EOPNOTSUPP;
+   }
+}
+
+static int mlx5e_setup_tc_block(struct net_device *dev,
+   struct tc_block_offload *f)
+{
+   struct mlx5e_priv *priv = netdev_priv(dev);
+
+   if (f->binder_type != TCF_BLOCK_BINDER_TYPE_CLSACT_INGRESS)
+   return -EOPNOTSUPP;
+
+   switch (f->command) {
+   case TC_BLOCK_BIND:
+   return tcf_block_cb_register(f->block, mlx5e_setup_tc_block_cb,
+priv, priv);
+   case TC_BLOCK_UNBIND:
+   tcf_block_cb_unregister(f->block, mlx5e_setup_tc_block_cb,
+   priv);
+   return 0;
+   default:
+   return -EOPNOTSUPP;
+   }
+}
 #endif
 
 int mlx5e_setup_tc(struct net_device *dev, enum tc_setup_type type,
@@ -3111,7 +3142,9 @@ int mlx5e_setup_tc(struct net_device *dev, enum 
tc_setup_type type,
switch (type) {
 #ifdef CONFIG_MLX5_ESWITCH
case TC_SETUP_CLSFLOWER:
-   return mlx5e_setup_tc_cls_flower(dev, type_data);
+   return 0; /* will be removed after conversion from ndo */
+   case TC_SETUP_BLOCK:
+   return mlx5e_setup_tc_block(dev, type_data);
 #endif
case TC_SETUP_MQPRIO:
return mlx5e_setup_tc_mqprio(dev, type_data);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
index 765fc74..4edd92d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
@@ -691,14 +691,6 @@ static int mlx5e_rep_setup_tc(struct net_device *dev, enum 
tc_setup_type type,
}
 }
 
-static int mlx5e_rep_setup_tc_cb(enum tc_setup_type type, void *type_data,
-void *cb_priv)
-{
-   struct net_device *dev = cb_priv;
-
-   return mlx5e_setup_tc(dev, type, type_data);
-}
-
 bool mlx5e_is_uplink_rep(struct mlx5e_priv *priv)
 {
struct mlx5_eswitch *esw = priv->mdev->priv.eswitch;
@@ -987,6 +979,7 @@ mlx5e_vport_rep_load(struct mlx5_eswitch *esw, struct 
mlx5_eswitch_rep *rep)
 {
struct mlx5e_rep_priv *rpriv;
struct net_device *netdev;
+   struct mlx5e_priv *upriv;
int err;
 
rpriv = kzalloc(sizeof(*rpriv), GFP_KERNEL);
@@ -1018,8 +1011,9 @@ mlx5e_vport_rep_load(struct mlx5_eswitch *esw, struct 
mlx5_eswitch_rep *rep)
goto

[patch net-next 07/20] net: sched: cls_u32: call block callbacks for offload

2017-10-17 Thread Jiri Pirko

From: Jiri Pirko 

Use the newly introduced callbacks infrastructure and call block
callbacks alongside with the existing per-netdev ndo_setup_tc.

Signed-off-by: Jiri Pirko 
---
 net/sched/cls_u32.c | 72 ++---
 1 file changed, 52 insertions(+), 20 deletions(-)

diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c
index f407f13..24cc429 100644
--- a/net/sched/cls_u32.c
+++ b/net/sched/cls_u32.c
@@ -465,39 +465,57 @@ static int u32_delete_key(struct tcf_proto *tp, struct 
tc_u_knode *key)
 static void u32_clear_hw_hnode(struct tcf_proto *tp, struct tc_u_hnode *h)
 {
struct net_device *dev = tp->q->dev_queue->dev;
+   struct tcf_block *block = tp->chain->block;
struct tc_cls_u32_offload cls_u32 = {};
 
-   if (!tc_should_offload(dev, 0))
-   return;
-
tc_cls_common_offload_init(_u32.common, tp);
cls_u32.command = TC_CLSU32_DELETE_HNODE;
cls_u32.hnode.divisor = h->divisor;
cls_u32.hnode.handle = h->handle;
cls_u32.hnode.prio = h->prio;
 
-   dev->netdev_ops->ndo_setup_tc(dev, TC_SETUP_CLSU32, _u32);
+   if (tc_can_offload(dev))
+   dev->netdev_ops->ndo_setup_tc(dev, TC_SETUP_CLSU32, _u32);
+   tc_setup_cb_call(block, NULL, TC_SETUP_CLSU32, _u32, false);
 }
 
 static int u32_replace_hw_hnode(struct tcf_proto *tp, struct tc_u_hnode *h,
u32 flags)
 {
struct net_device *dev = tp->q->dev_queue->dev;
+   struct tcf_block *block = tp->chain->block;
struct tc_cls_u32_offload cls_u32 = {};
+   bool skip_sw = tc_skip_sw(flags);
+   bool offloaded = false;
int err;
 
-   if (!tc_should_offload(dev, flags))
-   return tc_skip_sw(flags) ? -EINVAL : 0;
-
tc_cls_common_offload_init(_u32.common, tp);
cls_u32.command = TC_CLSU32_NEW_HNODE;
cls_u32.hnode.divisor = h->divisor;
cls_u32.hnode.handle = h->handle;
cls_u32.hnode.prio = h->prio;
 
-   err = dev->netdev_ops->ndo_setup_tc(dev, TC_SETUP_CLSU32, _u32);
-   if (tc_skip_sw(flags))
+   if (tc_can_offload(dev)) {
+   err = dev->netdev_ops->ndo_setup_tc(dev, TC_SETUP_CLSU32,
+   _u32);
+   if (err) {
+   if (skip_sw)
+   return err;
+   } else {
+   offloaded = true;
+   }
+   }
+
+   err = tc_setup_cb_call(block, NULL, TC_SETUP_CLSU32, _u32, skip_sw);
+   if (err < 0) {
+   u32_clear_hw_hnode(tp, h);
return err;
+   } else if (err > 0) {
+   offloaded = true;
+   }
+
+   if (skip_sw && !offloaded)
+   return -EINVAL;
 
return 0;
 }
@@ -505,28 +523,27 @@ static int u32_replace_hw_hnode(struct tcf_proto *tp, 
struct tc_u_hnode *h,
 static void u32_remove_hw_knode(struct tcf_proto *tp, u32 handle)
 {
struct net_device *dev = tp->q->dev_queue->dev;
+   struct tcf_block *block = tp->chain->block;
struct tc_cls_u32_offload cls_u32 = {};
 
-   if (!tc_should_offload(dev, 0))
-   return;
-
tc_cls_common_offload_init(_u32.common, tp);
cls_u32.command = TC_CLSU32_DELETE_KNODE;
cls_u32.knode.handle = handle;
 
-   dev->netdev_ops->ndo_setup_tc(dev, TC_SETUP_CLSU32, _u32);
+   if (tc_can_offload(dev))
+   dev->netdev_ops->ndo_setup_tc(dev, TC_SETUP_CLSU32, _u32);
+   tc_setup_cb_call(block, NULL, TC_SETUP_CLSU32, _u32, false);
 }
 
 static int u32_replace_hw_knode(struct tcf_proto *tp, struct tc_u_knode *n,
u32 flags)
 {
struct net_device *dev = tp->q->dev_queue->dev;
+   struct tcf_block *block = tp->chain->block;
struct tc_cls_u32_offload cls_u32 = {};
+   bool skip_sw = tc_skip_sw(flags);
int err;
 
-   if (!tc_should_offload(dev, flags))
-   return tc_skip_sw(flags) ? -EINVAL : 0;
-
tc_cls_common_offload_init(_u32.common, tp);
cls_u32.command = TC_CLSU32_REPLACE_KNODE;
cls_u32.knode.handle = n->handle;
@@ -543,13 +560,28 @@ static int u32_replace_hw_knode(struct tcf_proto *tp, 
struct tc_u_knode *n,
if (n->ht_down)
cls_u32.knode.link_handle = n->ht_down->handle;
 
-   err = dev->netdev_ops->ndo_setup_tc(dev, TC_SETUP_CLSU32, _u32);
 
-   if (!err)
-   n->flags |= TCA_CLS_FLAGS_IN_HW;
+   if (tc_can_offload(dev)) {
+   err = dev->netdev_ops->ndo_setup_tc(dev, TC_SETUP_CLSU32,
+   _u32);
+   if (err) {
+   if (skip_sw)
+   return err;
+   } else {
+   n->flags |= TCA_CLS_FLAGS_IN_HW;
+   }
+   }
 
-   if

[patch net-next 11/20] bnxt: Convert ndo_setup_tc offloads to block callbacks

2017-10-17 Thread Jiri Pirko

From: Jiri Pirko 

Benefit from the newly introduced block callback infrastructure and
convert ndo_setup_tc calls for flower offloads to block callbacks.

Signed-off-by: Jiri Pirko 
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c | 37 +++
 drivers/net/ethernet/broadcom/bnxt/bnxt_tc.c  |  3 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt_vfr.c | 43 +--
 3 files changed, 73 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c 
b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 5ba4993..4dde2b8 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -7295,15 +7295,40 @@ int bnxt_setup_mq_tc(struct net_device *dev, u8 tc)
return 0;
 }
 
-static int bnxt_setup_flower(struct net_device *dev,
-struct tc_cls_flower_offload *cls_flower)
+static int bnxt_setup_tc_block_cb(enum tc_setup_type type, void *type_data,
+ void *cb_priv)
 {
-   struct bnxt *bp = netdev_priv(dev);
+   struct bnxt *bp = cb_priv;
 
if (BNXT_VF(bp))
return -EOPNOTSUPP;
 
-   return bnxt_tc_setup_flower(bp, bp->pf.fw_fid, cls_flower);
+   switch (type) {
+   case TC_SETUP_CLSFLOWER:
+   return bnxt_tc_setup_flower(bp, bp->pf.fw_fid, type_data);
+   default:
+   return -EOPNOTSUPP;
+   }
+}
+
+static int bnxt_setup_tc_block(struct net_device *dev,
+  struct tc_block_offload *f)
+{
+   struct bnxt *bp = netdev_priv(dev);
+
+   if (f->binder_type != TCF_BLOCK_BINDER_TYPE_CLSACT_INGRESS)
+   return -EOPNOTSUPP;
+
+   switch (f->command) {
+   case TC_BLOCK_BIND:
+   return tcf_block_cb_register(f->block, bnxt_setup_tc_block_cb,
+bp, bp);
+   case TC_BLOCK_UNBIND:
+   tcf_block_cb_unregister(f->block, bnxt_setup_tc_block_cb, bp);
+   return 0;
+   default:
+   return -EOPNOTSUPP;
+   }
 }
 
 static int bnxt_setup_tc(struct net_device *dev, enum tc_setup_type type,
@@ -7311,7 +7336,9 @@ static int bnxt_setup_tc(struct net_device *dev, enum 
tc_setup_type type,
 {
switch (type) {
case TC_SETUP_CLSFLOWER:
-   return bnxt_setup_flower(dev, type_data);
+   return 0; /* will be removed after conversion from ndo */
+   case TC_SETUP_BLOCK:
+   return bnxt_setup_tc_block(dev, type_data);
case TC_SETUP_MQPRIO: {
struct tc_mqprio_qopt *mqprio = type_data;
 
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_tc.c 
b/drivers/net/ethernet/broadcom/bnxt/bnxt_tc.c
index 4730c04..a9cb653 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_tc.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_tc.c
@@ -748,8 +748,7 @@ int bnxt_tc_setup_flower(struct bnxt *bp, u16 src_fid,
 {
int rc = 0;
 
-   if (!is_classid_clsact_ingress(cls_flower->common.classid) ||
-   cls_flower->common.chain_index)
+   if (cls_flower->common.chain_index)
return -EOPNOTSUPP;
 
switch (cls_flower->command) {
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_vfr.c 
b/drivers/net/ethernet/broadcom/bnxt/bnxt_vfr.c
index e75db04..cc278d7 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_vfr.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_vfr.c
@@ -115,10 +115,11 @@ bnxt_vf_rep_get_stats64(struct net_device *dev,
stats->tx_bytes = vf_rep->tx_stats.bytes;
 }
 
-static int bnxt_vf_rep_setup_tc(struct net_device *dev, enum tc_setup_type 
type,
-   void *type_data)
+static int bnxt_vf_rep_setup_tc_block_cb(enum tc_setup_type type,
+void *type_data,
+void *cb_priv)
 {
-   struct bnxt_vf_rep *vf_rep = netdev_priv(dev);
+   struct bnxt_vf_rep *vf_rep = cb_priv;
struct bnxt *bp = vf_rep->bp;
int vf_fid = bp->pf.vf[vf_rep->vf_idx].fw_fid;
 
@@ -130,6 +131,42 @@ static int bnxt_vf_rep_setup_tc(struct net_device *dev, 
enum tc_setup_type type,
}
 }
 
+static int bnxt_vf_rep_setup_tc_block(struct net_device *dev,
+ struct tc_block_offload *f)
+{
+   struct bnxt_vf_rep *vf_rep = netdev_priv(dev);
+
+   if (f->binder_type != TCF_BLOCK_BINDER_TYPE_CLSACT_INGRESS)
+   return -EOPNOTSUPP;
+
+   switch (f->command) {
+   case TC_BLOCK_BIND:
+   return tcf_block_cb_register(f->block,
+bnxt_vf_rep_setup_tc_block_cb,
+vf_rep, vf_rep);
+   return 0;
+   case TC_BLOCK_UNBIND:
+   tcf_block_cb_unregister(f->block,
+   bnxt_vf_rep_setup_tc_block_cb, vf_rep);
+

[patch net-next 15/20] nfp: flower: Convert ndo_setup_tc offloads to block callbacks

2017-10-17 Thread Jiri Pirko

From: Jiri Pirko 

Benefit from the newly introduced block callback infrastructure and
convert ndo_setup_tc calls for flower offloads to block callbacks.

Signed-off-by: Jiri Pirko 
---
 .../net/ethernet/netronome/nfp/flower/offload.c| 56 ++
 1 file changed, 48 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/flower/offload.c 
b/drivers/net/ethernet/netronome/nfp/flower/offload.c
index 6f239c2..f8523df 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/offload.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/offload.c
@@ -449,6 +449,10 @@ static int
 nfp_flower_repr_offload(struct nfp_app *app, struct net_device *netdev,
struct tc_cls_flower_offload *flower)
 {
+   if (!eth_proto_is_802_3(flower->common.protocol) ||
+   flower->common.chain_index)
+   return -EOPNOTSUPP;
+
switch (flower->command) {
case TC_CLSFLOWER_REPLACE:
return nfp_flower_add_offload(app, netdev, flower);
@@ -461,16 +465,52 @@ nfp_flower_repr_offload(struct nfp_app *app, struct 
net_device *netdev,
return -EOPNOTSUPP;
 }
 
-int nfp_flower_setup_tc(struct nfp_app *app, struct net_device *netdev,
-   enum tc_setup_type type, void *type_data)
+static int nfp_flower_setup_tc_block_cb(enum tc_setup_type type,
+   void *type_data, void *cb_priv)
+{
+   struct nfp_net *nn = cb_priv;
+
+   switch (type) {
+   case TC_SETUP_CLSFLOWER:
+   return nfp_flower_repr_offload(nn->app, nn->port->netdev,
+  type_data);
+   default:
+   return -EOPNOTSUPP;
+   }
+}
+
+static int nfp_flower_setup_tc_block(struct net_device *netdev,
+struct tc_block_offload *f)
 {
-   struct tc_cls_flower_offload *cls_flower = type_data;
+   struct nfp_net *nn = netdev_priv(netdev);
 
-   if (type != TC_SETUP_CLSFLOWER ||
-   !is_classid_clsact_ingress(cls_flower->common.classid) ||
-   !eth_proto_is_802_3(cls_flower->common.protocol) ||
-   cls_flower->common.chain_index)
+   if (f->binder_type != TCF_BLOCK_BINDER_TYPE_CLSACT_INGRESS)
return -EOPNOTSUPP;
 
-   return nfp_flower_repr_offload(app, netdev, cls_flower);
+   switch (f->command) {
+   case TC_BLOCK_BIND:
+   return tcf_block_cb_register(f->block,
+nfp_flower_setup_tc_block_cb,
+nn, nn);
+   case TC_BLOCK_UNBIND:
+   tcf_block_cb_unregister(f->block,
+   nfp_flower_setup_tc_block_cb,
+   nn);
+   return 0;
+   default:
+   return -EOPNOTSUPP;
+   }
+}
+
+int nfp_flower_setup_tc(struct nfp_app *app, struct net_device *netdev,
+   enum tc_setup_type type, void *type_data)
+{
+   switch (type) {
+   case TC_SETUP_CLSFLOWER:
+   return 0; /* will be removed after conversion from ndo */
+   case TC_SETUP_BLOCK:
+   return nfp_flower_setup_tc_block(netdev, type_data);
+   default:
+   return -EOPNOTSUPP;
+   }
 }
-- 
2.9.5

[patch net-next 16/20] nfp: bpf: Convert ndo_setup_tc offloads to block callbacks

2017-10-17 Thread Jiri Pirko

From: Jiri Pirko 

Benefit from the newly introduced block callback infrastructure and
convert ndo_setup_tc calls for bpf offloads to block callbacks.

Signed-off-by: Jiri Pirko 
---
 drivers/net/ethernet/netronome/nfp/bpf/main.c | 54 ++-
 1 file changed, 45 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/bpf/main.c 
b/drivers/net/ethernet/netronome/nfp/bpf/main.c
index 6e74f8d..64f97b3 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/main.c
+++ b/drivers/net/ethernet/netronome/nfp/bpf/main.c
@@ -114,22 +114,58 @@ static void nfp_bpf_vnic_free(struct nfp_app *app, struct 
nfp_net *nn)
kfree(nn->app_priv);
 }
 
-static int nfp_bpf_setup_tc(struct nfp_app *app, struct net_device *netdev,
-   enum tc_setup_type type, void *type_data)
+static int nfp_bpf_setup_tc_block_cb(enum tc_setup_type type,
+void *type_data, void *cb_priv)
 {
struct tc_cls_bpf_offload *cls_bpf = type_data;
+   struct nfp_net *nn = cb_priv;
+
+   switch (type) {
+   case TC_SETUP_CLSBPF:
+   if (!nfp_net_ebpf_capable(nn) ||
+   cls_bpf->common.protocol != htons(ETH_P_ALL) ||
+   cls_bpf->common.chain_index)
+   return -EOPNOTSUPP;
+   return nfp_net_bpf_offload(nn, cls_bpf);
+   default:
+   return -EOPNOTSUPP;
+   }
+}
+
+static int nfp_bpf_setup_tc_block(struct net_device *netdev,
+ struct tc_block_offload *f)
+{
struct nfp_net *nn = netdev_priv(netdev);
 
-   if (type != TC_SETUP_CLSBPF || !nfp_net_ebpf_capable(nn) ||
-   !is_classid_clsact_ingress(cls_bpf->common.classid) ||
-   cls_bpf->common.protocol != htons(ETH_P_ALL) ||
-   cls_bpf->common.chain_index)
+   if (f->binder_type != TCF_BLOCK_BINDER_TYPE_CLSACT_INGRESS)
return -EOPNOTSUPP;
 
-   if (nn->dp.bpf_offload_xdp)
-   return -EBUSY;
+   switch (f->command) {
+   case TC_BLOCK_BIND:
+   return tcf_block_cb_register(f->block,
+nfp_bpf_setup_tc_block_cb,
+nn, nn);
+   case TC_BLOCK_UNBIND:
+   tcf_block_cb_unregister(f->block,
+   nfp_bpf_setup_tc_block_cb,
+   nn);
+   return 0;
+   default:
+   return -EOPNOTSUPP;
+   }
+}
 
-   return nfp_net_bpf_offload(nn, cls_bpf);
+static int nfp_bpf_setup_tc(struct nfp_app *app, struct net_device *netdev,
+   enum tc_setup_type type, void *type_data)
+{
+   switch (type) {
+   case TC_SETUP_CLSBPF:
+   return 0; /* will be removed after conversion from ndo */
+   case TC_SETUP_BLOCK:
+   return nfp_bpf_setup_tc_block(netdev, type_data);
+   default:
+   return -EOPNOTSUPP;
+   }
 }
 
 static bool nfp_bpf_tc_busy(struct nfp_app *app, struct nfp_net *nn)
-- 
2.9.5

[patch net-next 14/20] mlx5e_rep: Convert ndo_setup_tc offloads to block callbacks

2017-10-17 Thread Jiri Pirko

From: Jiri Pirko 

Benefit from the newly introduced block callback infrastructure and
convert ndo_setup_tc calls for flower offloads to block callbacks.

Signed-off-by: Jiri Pirko 
---
 drivers/net/ethernet/mellanox/mlx5/core/en_rep.c | 44 
 1 file changed, 38 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
index 4edd92d..f59d81a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
@@ -659,13 +659,10 @@ static int mlx5e_rep_get_phys_port_name(struct net_device 
*dev,
 }
 
 static int
-mlx5e_rep_setup_tc_cls_flower(struct net_device *dev,
+mlx5e_rep_setup_tc_cls_flower(struct mlx5e_priv *priv,
  struct tc_cls_flower_offload *cls_flower)
 {
-   struct mlx5e_priv *priv = netdev_priv(dev);
-
-   if (!is_classid_clsact_ingress(cls_flower->common.classid) ||
-   cls_flower->common.chain_index)
+   if (cls_flower->common.chain_index)
return -EOPNOTSUPP;
 
switch (cls_flower->command) {
@@ -680,12 +677,47 @@ mlx5e_rep_setup_tc_cls_flower(struct net_device *dev,
}
 }
 
+static int mlx5e_rep_setup_tc_cb(enum tc_setup_type type, void *type_data,
+void *cb_priv)
+{
+   struct mlx5e_priv *priv = cb_priv;
+
+   switch (type) {
+   case TC_SETUP_CLSFLOWER:
+   return mlx5e_rep_setup_tc_cls_flower(priv, type_data);
+   default:
+   return -EOPNOTSUPP;
+   }
+}
+
+static int mlx5e_rep_setup_tc_block(struct net_device *dev,
+   struct tc_block_offload *f)
+{
+   struct mlx5e_priv *priv = netdev_priv(dev);
+
+   if (f->binder_type != TCF_BLOCK_BINDER_TYPE_CLSACT_INGRESS)
+   return -EOPNOTSUPP;
+
+   switch (f->command) {
+   case TC_BLOCK_BIND:
+   return tcf_block_cb_register(f->block, mlx5e_rep_setup_tc_cb,
+priv, priv);
+   case TC_BLOCK_UNBIND:
+   tcf_block_cb_unregister(f->block, mlx5e_rep_setup_tc_cb, priv);
+   return 0;
+   default:
+   return -EOPNOTSUPP;
+   }
+}
+
 static int mlx5e_rep_setup_tc(struct net_device *dev, enum tc_setup_type type,
  void *type_data)
 {
switch (type) {
case TC_SETUP_CLSFLOWER:
-   return mlx5e_rep_setup_tc_cls_flower(dev, type_data);
+   return 0; /* will be removed after conversion from ndo */
+   case TC_SETUP_BLOCK:
+   return mlx5e_rep_setup_tc_block(dev, type_data);
default:
return -EOPNOTSUPP;
}
-- 
2.9.5

[patch net-next 12/20] cxgb4: Convert ndo_setup_tc offloads to block callbacks

2017-10-17 Thread Jiri Pirko

From: Jiri Pirko 

Benefit from the newly introduced block callback infrastructure and
convert ndo_setup_tc calls for flower and u32 offloads to block callbacks.

Signed-off-by: Jiri Pirko 
---
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c | 45 +
 1 file changed, 39 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c 
b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
index 8d97ae6..ca0b96b 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
@@ -2884,8 +2884,7 @@ static int cxgb_set_tx_maxrate(struct net_device *dev, 
int index, u32 rate)
 static int cxgb_setup_tc_flower(struct net_device *dev,
struct tc_cls_flower_offload *cls_flower)
 {
-   if (!is_classid_clsact_ingress(cls_flower->common.classid) ||
-   cls_flower->common.chain_index)
+   if (cls_flower->common.chain_index)
return -EOPNOTSUPP;
 
switch (cls_flower->command) {
@@ -2903,8 +2902,7 @@ static int cxgb_setup_tc_flower(struct net_device *dev,
 static int cxgb_setup_tc_cls_u32(struct net_device *dev,
 struct tc_cls_u32_offload *cls_u32)
 {
-   if (!is_classid_clsact_ingress(cls_u32->common.classid) ||
-   cls_u32->common.chain_index)
+   if (cls_u32->common.chain_index)
return -EOPNOTSUPP;
 
switch (cls_u32->command) {
@@ -2918,9 +2916,10 @@ static int cxgb_setup_tc_cls_u32(struct net_device *dev,
}
 }
 
-static int cxgb_setup_tc(struct net_device *dev, enum tc_setup_type type,
-void *type_data)
+static int cxgb_setup_tc_block_cb(enum tc_setup_type type, void *type_data,
+ void *cb_priv)
 {
+   struct net_device *dev = cb_priv;
struct port_info *pi = netdev2pinfo(dev);
struct adapter *adap = netdev2adap(dev);
 
@@ -2941,6 +2940,40 @@ static int cxgb_setup_tc(struct net_device *dev, enum 
tc_setup_type type,
}
 }
 
+static int cxgb_setup_tc_block(struct net_device *dev,
+  struct tc_block_offload *f)
+{
+   struct port_info *pi = netdev2pinfo(dev);
+
+   if (f->binder_type != TCF_BLOCK_BINDER_TYPE_CLSACT_INGRESS)
+   return -EOPNOTSUPP;
+
+   switch (f->command) {
+   case TC_BLOCK_BIND:
+   return tcf_block_cb_register(f->block, cxgb_setup_tc_block_cb,
+pi, dev);
+   case TC_BLOCK_UNBIND:
+   tcf_block_cb_unregister(f->block, cxgb_setup_tc_block_cb, pi);
+   return 0;
+   default:
+   return -EOPNOTSUPP;
+   }
+}
+
+static int cxgb_setup_tc(struct net_device *dev, enum tc_setup_type type,
+void *type_data)
+{
+   switch (type) {
+   case TC_SETUP_CLSU32:
+   case TC_SETUP_CLSFLOWER:
+   return 0; /* will be removed after conversion from ndo */
+   case TC_SETUP_BLOCK:
+   return cxgb_setup_tc_block(dev, type_data);
+   default:
+   return -EOPNOTSUPP;
+   }
+}
+
 static netdev_features_t cxgb_fix_features(struct net_device *dev,
   netdev_features_t features)
 {
-- 
2.9.5

[patch net-next 19/20] net: sched: remove unused classid field from tc_cls_common_offload

2017-10-17 Thread Jiri Pirko

From: Jiri Pirko 

It is no longer used by the drivers, so remove it.

Signed-off-by: Jiri Pirko 
---
 include/net/pkt_cls.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index fcca5a9..04caa24 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -561,7 +561,6 @@ struct tc_cls_common_offload {
u32 chain_index;
__be16 protocol;
u32 prio;
-   u32 classid;
 };
 
 static inline void
@@ -571,7 +570,6 @@ tc_cls_common_offload_init(struct tc_cls_common_offload 
*cls_common,
cls_common->chain_index = tp->chain->index;
cls_common->protocol = tp->protocol;
cls_common->prio = tp->prio;
-   cls_common->classid = tp->classid;
 }
 
 struct tc_cls_u32_knode {
-- 
2.9.5

[patch net-next 00/20] net: sched: convert cls ndo_setup_tc offload calls to per-block callbacks

2017-10-17 Thread Jiri Pirko

From: Jiri Pirko 

This patchset is a bit bigger, but most of the patches are doing the
same changes in multiple classifiers and drivers. I could do some
squashes, but I think it is better split.

This is another dependency on the way to shared block implementation.
The goal is to remove use of tp->q in classifiers code.

Also, this provides drivers possibility to track binding of blocks to
qdiscs. Legacy drivers which do not support shared block offloading.
register one callback per binding. That maintains the current
functionality we have with ndo_setup_tc. Drivers which support block
sharing offload register one callback per block which safes overhead.

Patches 1-4 introduce the binding notifications and per-block callbacks
Patches 5-8 add block callbacks calls to classifiers
Patches 9-17 do convert from ndo_setup_tc calls to block callbacks for
 classifier offloads in drivers
Patches 18-20 do cleanup

Jiri Pirko (20):
  net: sched: add block bind/unbind notif. and extended block_get/put
  net: sched: use extended variants of block_get/put in ingress and
clsact qdiscs
  net: sched: introduce per-block callbacks
  net: sched: use tc_setup_cb_call to call per-block callbacks
  net: sched: cls_matchall: call block callbacks for offload
  net: sched: cls_u32: swap u32_remove_hw_knode and u32_remove_hw_hnode
  net: sched: cls_u32: call block callbacks for offload
  net: sched: cls_bpf: call block callbacks for offload
  mlxsw: spectrum: Convert ndo_setup_tc offloads to block callbacks
  mlx5e: Convert ndo_setup_tc offloads to block callbacks
  bnxt: Convert ndo_setup_tc offloads to block callbacks
  cxgb4: Convert ndo_setup_tc offloads to block callbacks
  ixgbe: Convert ndo_setup_tc offloads to block callbacks
  mlx5e_rep: Convert ndo_setup_tc offloads to block callbacks
  nfp: flower: Convert ndo_setup_tc offloads to block callbacks
  nfp: bpf: Convert ndo_setup_tc offloads to block callbacks
  dsa: Convert ndo_setup_tc offloads to block callbacks
  net: sched: avoid ndo_setup_tc calls for TC_SETUP_CLS*
  net: sched: remove unused classid field from tc_cls_common_offload
  net: sched: remove unused is_classid_clsact_ingress/egress helpers

 drivers/net/ethernet/broadcom/bnxt/bnxt.c  |  37 -
 drivers/net/ethernet/broadcom/bnxt/bnxt_tc.c   |   3 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt_vfr.c  |  41 -
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c|  42 -
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c  |  45 -
 drivers/net/ethernet/mellanox/mlx5/core/en.h   |   4 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |  45 -
 drivers/net/ethernet/mellanox/mlx5/core/en_rep.c   |  62 +--
 drivers/net/ethernet/mellanox/mlxsw/spectrum.c |  83 +++---
 drivers/net/ethernet/netronome/nfp/bpf/main.c  |  52 +-
 .../net/ethernet/netronome/nfp/flower/offload.c|  54 +-
 include/linux/netdevice.h  |   1 +
 include/net/pkt_cls.h  | 129 ++-
 include/net/pkt_sched.h|  13 --
 include/net/sch_generic.h  |   1 +
 net/dsa/slave.c|  64 ++--
 net/sched/cls_api.c| 182 -
 net/sched/cls_bpf.c|  28 +++-
 net/sched/cls_flower.c |  29 +---
 net/sched/cls_matchall.c   |  58 +++
 net/sched/cls_u32.c|  67 
 net/sched/sch_ingress.c|  36 +++-
 22 files changed, 849 insertions(+), 227 deletions(-)

-- 
2.9.5

[patch net-next 18/20] net: sched: avoid ndo_setup_tc calls for TC_SETUP_CLS*

2017-10-17 Thread Jiri Pirko

From: Jiri Pirko 

All drivers are converted to use block callbacks for TC_SETUP_CLS*.
So it is now safe to remove the calls to ndo_setup_tc from cls_*

Signed-off-by: Jiri Pirko 
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c  |  2 --
 drivers/net/ethernet/broadcom/bnxt/bnxt_vfr.c  |  2 --
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c|  3 ---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c  |  2 --
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |  2 --
 drivers/net/ethernet/mellanox/mlx5/core/en_rep.c   |  2 --
 drivers/net/ethernet/mellanox/mlxsw/spectrum.c |  3 ---
 drivers/net/ethernet/netronome/nfp/bpf/main.c  |  2 --
 .../net/ethernet/netronome/nfp/flower/offload.c|  2 --
 net/dsa/slave.c|  2 --
 net/sched/cls_bpf.c| 14 --
 net/sched/cls_flower.c | 20 --
 net/sched/cls_matchall.c   | 16 ---
 net/sched/cls_u32.c| 31 --
 14 files changed, 103 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c 
b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 4dde2b8..22a94b1 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -7335,8 +7335,6 @@ static int bnxt_setup_tc(struct net_device *dev, enum 
tc_setup_type type,
 void *type_data)
 {
switch (type) {
-   case TC_SETUP_CLSFLOWER:
-   return 0; /* will be removed after conversion from ndo */
case TC_SETUP_BLOCK:
return bnxt_setup_tc_block(dev, type_data);
case TC_SETUP_MQPRIO: {
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_vfr.c 
b/drivers/net/ethernet/broadcom/bnxt/bnxt_vfr.c
index cc278d7..6dff5aa 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_vfr.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_vfr.c
@@ -158,8 +158,6 @@ static int bnxt_vf_rep_setup_tc(struct net_device *dev, 
enum tc_setup_type type,
void *type_data)
 {
switch (type) {
-   case TC_SETUP_CLSFLOWER:
-   return 0; /* will be removed after conversion from ndo */
case TC_SETUP_BLOCK:
return bnxt_vf_rep_setup_tc_block(dev, type_data);
default:
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c 
b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
index ca0b96b..6edbf54 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
@@ -2964,9 +2964,6 @@ static int cxgb_setup_tc(struct net_device *dev, enum 
tc_setup_type type,
 void *type_data)
 {
switch (type) {
-   case TC_SETUP_CLSU32:
-   case TC_SETUP_CLSFLOWER:
-   return 0; /* will be removed after conversion from ndo */
case TC_SETUP_BLOCK:
return cxgb_setup_tc_block(dev, type_data);
default:
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 6b52cfd..3c85120 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -9432,8 +9432,6 @@ static int __ixgbe_setup_tc(struct net_device *dev, enum 
tc_setup_type type,
void *type_data)
 {
switch (type) {
-   case TC_SETUP_CLSU32:
-   return 0; /* will be removed after conversion from ndo */
case TC_SETUP_BLOCK:
return ixgbe_setup_tc_block(dev, type_data);
case TC_SETUP_MQPRIO:
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index e810868..560b208 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -3141,8 +3141,6 @@ int mlx5e_setup_tc(struct net_device *dev, enum 
tc_setup_type type,
 {
switch (type) {
 #ifdef CONFIG_MLX5_ESWITCH
-   case TC_SETUP_CLSFLOWER:
-   return 0; /* will be removed after conversion from ndo */
case TC_SETUP_BLOCK:
return mlx5e_setup_tc_block(dev, type_data);
 #endif
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
index f59d81a..0edb706 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
@@ -714,8 +714,6 @@ static int mlx5e_rep_setup_tc(struct net_device *dev, enum 
tc_setup_type type,
  void *type_data)
 {
switch (type) {
-   case TC_SETUP_CLSFLOWER:
-   return 0; /* will be removed after conversion from ndo */
case TC_SETUP_BLOCK:
return mlx5e_rep_setup_tc_block(dev, type_data);
default:
diff --git

[patch net-next 17/20] dsa: Convert ndo_setup_tc offloads to block callbacks

2017-10-17 Thread Jiri Pirko

From: Jiri Pirko 

Benefit from the newly introduced block callback infrastructure and
convert ndo_setup_tc calls for matchall offloads to block callbacks.

Signed-off-by: Jiri Pirko 
---
 net/dsa/slave.c | 64 +++--
 1 file changed, 53 insertions(+), 11 deletions(-)

diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index 45f4ea8..0a20b19 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -788,17 +788,9 @@ static void dsa_slave_del_cls_matchall(struct net_device 
*dev,
 }
 
 static int dsa_slave_setup_tc_cls_matchall(struct net_device *dev,
-  struct tc_cls_matchall_offload *cls)
+  struct tc_cls_matchall_offload *cls,
+  bool ingress)
 {
-   bool ingress;
-
-   if (is_classid_clsact_ingress(cls->common.classid))
-   ingress = true;
-   else if (is_classid_clsact_egress(cls->common.classid))
-   ingress = false;
-   else
-   return -EOPNOTSUPP;
-
if (cls->common.chain_index)
return -EOPNOTSUPP;
 
@@ -813,12 +805,62 @@ static int dsa_slave_setup_tc_cls_matchall(struct 
net_device *dev,
}
 }
 
+static int dsa_slave_setup_tc_block_cb(enum tc_setup_type type, void 
*type_data,
+  void *cb_priv, bool ingress)
+{
+   struct net_device *dev = cb_priv;
+
+   switch (type) {
+   case TC_SETUP_CLSMATCHALL:
+   return dsa_slave_setup_tc_cls_matchall(dev, type_data, ingress);
+   default:
+   return -EOPNOTSUPP;
+   }
+}
+
+static int dsa_slave_setup_tc_block_cb_ig(enum tc_setup_type type,
+ void *type_data, void *cb_priv)
+{
+   return dsa_slave_setup_tc_block_cb(type, type_data, cb_priv, true);
+}
+
+static int dsa_slave_setup_tc_block_cb_eg(enum tc_setup_type type,
+ void *type_data, void *cb_priv)
+{
+   return dsa_slave_setup_tc_block_cb(type, type_data, cb_priv, false);
+}
+
+static int dsa_slave_setup_tc_block(struct net_device *dev,
+   struct tc_block_offload *f)
+{
+   tc_setup_cb_t *cb;
+
+   if (f->binder_type == TCF_BLOCK_BINDER_TYPE_CLSACT_INGRESS)
+   cb = dsa_slave_setup_tc_block_cb_ig;
+   else if (f->binder_type == TCF_BLOCK_BINDER_TYPE_CLSACT_EGRESS)
+   cb = dsa_slave_setup_tc_block_cb_eg;
+   else
+   return -EOPNOTSUPP;
+
+   switch (f->command) {
+   case TC_BLOCK_BIND:
+   return tcf_block_cb_register(f->block, cb, dev, dev);
+   case TC_BLOCK_UNBIND:
+   tcf_block_cb_unregister(f->block, cb, dev);
+   return 0;
+   default:
+   return -EOPNOTSUPP;
+   }
+}
+
 static int dsa_slave_setup_tc(struct net_device *dev, enum tc_setup_type type,
  void *type_data)
 {
switch (type) {
case TC_SETUP_CLSMATCHALL:
-   return dsa_slave_setup_tc_cls_matchall(dev, type_data);
+   return 0; /* will be removed after conversion from ndo */
+   case TC_SETUP_BLOCK:
+   return dsa_slave_setup_tc_block(dev, type_data);
default:
return -EOPNOTSUPP;
}
-- 
2.9.5

[patch net-next 20/20] net: sched: remove unused is_classid_clsact_ingress/egress helpers

2017-10-17 Thread Jiri Pirko

From: Jiri Pirko 

These helpers are no longer in use by drivers, so remove them.

Signed-off-by: Jiri Pirko 
---
 include/net/pkt_sched.h | 13 -
 1 file changed, 13 deletions(-)

diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h
index 2d234af..b8ecafc 100644
--- a/include/net/pkt_sched.h
+++ b/include/net/pkt_sched.h
@@ -135,19 +135,6 @@ static inline unsigned int psched_mtu(const struct 
net_device *dev)
return dev->mtu + dev->hard_header_len;
 }
 
-static inline bool is_classid_clsact_ingress(u32 classid)
-{
-   /* This also returns true for ingress qdisc */
-   return TC_H_MAJ(classid) == TC_H_MAJ(TC_H_CLSACT) &&
-  TC_H_MIN(classid) != TC_H_MIN(TC_H_MIN_EGRESS);
-}
-
-static inline bool is_classid_clsact_egress(u32 classid)
-{
-   return TC_H_MAJ(classid) == TC_H_MAJ(TC_H_CLSACT) &&
-  TC_H_MIN(classid) == TC_H_MIN(TC_H_MIN_EGRESS);
-}
-
 static inline struct net *qdisc_net(struct Qdisc *q)
 {
return dev_net(q->dev_queue->dev);
-- 
2.9.5

[patch net-next 08/20] net: sched: cls_bpf: call block callbacks for offload

2017-10-17 Thread Jiri Pirko

From: Jiri Pirko 

Use the newly introduced callbacks infrastructure and call block
callbacks alongside with the existing per-netdev ndo_setup_tc.

Signed-off-by: Jiri Pirko 
---
 net/sched/cls_bpf.c | 40 
 1 file changed, 32 insertions(+), 8 deletions(-)

diff --git a/net/sched/cls_bpf.c b/net/sched/cls_bpf.c
index 6c6b21f..e379fdf 100644
--- a/net/sched/cls_bpf.c
+++ b/net/sched/cls_bpf.c
@@ -147,7 +147,10 @@ static bool cls_bpf_is_ebpf(const struct cls_bpf_prog 
*prog)
 static int cls_bpf_offload_cmd(struct tcf_proto *tp, struct cls_bpf_prog *prog,
   enum tc_clsbpf_command cmd)
 {
+   bool addorrep = cmd == TC_CLSBPF_ADD || cmd == TC_CLSBPF_REPLACE;
struct net_device *dev = tp->q->dev_queue->dev;
+   struct tcf_block *block = tp->chain->block;
+   bool skip_sw = tc_skip_sw(prog->gen_flags);
struct tc_cls_bpf_offload cls_bpf = {};
int err;
 
@@ -159,17 +162,38 @@ static int cls_bpf_offload_cmd(struct tcf_proto *tp, 
struct cls_bpf_prog *prog,
cls_bpf.exts_integrated = prog->exts_integrated;
cls_bpf.gen_flags = prog->gen_flags;
 
-   err = dev->netdev_ops->ndo_setup_tc(dev, TC_SETUP_CLSBPF, _bpf);
-   if (!err && (cmd == TC_CLSBPF_ADD || cmd == TC_CLSBPF_REPLACE))
-   prog->gen_flags |= TCA_CLS_FLAGS_IN_HW;
+   if (tc_can_offload(dev)) {
+   err = dev->netdev_ops->ndo_setup_tc(dev, TC_SETUP_CLSBPF,
+   _bpf);
+   if (addorrep) {
+   if (err) {
+   if (skip_sw)
+   return err;
+   } else {
+   prog->gen_flags |= TCA_CLS_FLAGS_IN_HW;
+   }
+   }
+   }
+
+   err = tc_setup_cb_call(block, NULL, TC_SETUP_CLSBPF, _bpf, skip_sw);
+   if (addorrep) {
+   if (err < 0) {
+   cls_bpf_offload_cmd(tp, prog, TC_CLSBPF_DESTROY);
+   return err;
+   } else if (err > 0) {
+   prog->gen_flags |= TCA_CLS_FLAGS_IN_HW;
+   }
+   }
 
-   return err;
+   if (addorrep && skip_sw && !(prog->gen_flags && TCA_CLS_FLAGS_IN_HW))
+   return -EINVAL;
+
+   return 0;
 }
 
 static int cls_bpf_offload(struct tcf_proto *tp, struct cls_bpf_prog *prog,
   struct cls_bpf_prog *oldprog)
 {
-   struct net_device *dev = tp->q->dev_queue->dev;
struct cls_bpf_prog *obj = prog;
enum tc_clsbpf_command cmd;
bool skip_sw;
@@ -179,7 +203,7 @@ static int cls_bpf_offload(struct tcf_proto *tp, struct 
cls_bpf_prog *prog,
(oldprog && tc_skip_sw(oldprog->gen_flags));
 
if (oldprog && oldprog->offloaded) {
-   if (tc_should_offload(dev, prog->gen_flags)) {
+   if (!tc_skip_hw(prog->gen_flags)) {
cmd = TC_CLSBPF_REPLACE;
} else if (!tc_skip_sw(prog->gen_flags)) {
obj = oldprog;
@@ -188,14 +212,14 @@ static int cls_bpf_offload(struct tcf_proto *tp, struct 
cls_bpf_prog *prog,
return -EINVAL;
}
} else {
-   if (!tc_should_offload(dev, prog->gen_flags))
+   if (tc_skip_hw(prog->gen_flags))
return skip_sw ? -EINVAL : 0;
cmd = TC_CLSBPF_ADD;
}
 
ret = cls_bpf_offload_cmd(tp, obj, cmd);
if (ret)
-   return skip_sw ? ret : 0;
+   return ret;
 
obj->offloaded = true;
if (oldprog)
-- 
2.9.5

[patch net-next 13/20] ixgbe: Convert ndo_setup_tc offloads to block callbacks

2017-10-17 Thread Jiri Pirko

From: Jiri Pirko 

Benefit from the newly introduced block callback infrastructure and
convert ndo_setup_tc calls for u32 offloads to block callbacks.

Signed-off-by: Jiri Pirko 
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 45 +++
 1 file changed, 39 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 7683c14..6b52cfd 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -9365,13 +9365,10 @@ static int ixgbe_configure_clsu32(struct ixgbe_adapter 
*adapter,
return err;
 }
 
-static int ixgbe_setup_tc_cls_u32(struct net_device *dev,
+static int ixgbe_setup_tc_cls_u32(struct ixgbe_adapter *adapter,
  struct tc_cls_u32_offload *cls_u32)
 {
-   struct ixgbe_adapter *adapter = netdev_priv(dev);
-
-   if (!is_classid_clsact_ingress(cls_u32->common.classid) ||
-   cls_u32->common.chain_index)
+   if (cls_u32->common.chain_index)
return -EOPNOTSUPP;
 
switch (cls_u32->command) {
@@ -9390,6 +9387,40 @@ static int ixgbe_setup_tc_cls_u32(struct net_device *dev,
}
 }
 
+static int ixgbe_setup_tc_block_cb(enum tc_setup_type type, void *type_data,
+  void *cb_priv)
+{
+   struct ixgbe_adapter *adapter = cb_priv;
+
+   switch (type) {
+   case TC_SETUP_CLSU32:
+   return ixgbe_setup_tc_cls_u32(adapter, type_data);
+   default:
+   return -EOPNOTSUPP;
+   }
+}
+
+static int ixgbe_setup_tc_block(struct net_device *dev,
+   struct tc_block_offload *f)
+{
+   struct ixgbe_adapter *adapter = netdev_priv(dev);
+
+   if (f->binder_type != TCF_BLOCK_BINDER_TYPE_CLSACT_INGRESS)
+   return -EOPNOTSUPP;
+
+   switch (f->command) {
+   case TC_BLOCK_BIND:
+   return tcf_block_cb_register(f->block, ixgbe_setup_tc_block_cb,
+adapter, adapter);
+   case TC_BLOCK_UNBIND:
+   tcf_block_cb_unregister(f->block, ixgbe_setup_tc_block_cb,
+   adapter);
+   return 0;
+   default:
+   return -EOPNOTSUPP;
+   }
+}
+
 static int ixgbe_setup_tc_mqprio(struct net_device *dev,
 struct tc_mqprio_qopt *mqprio)
 {
@@ -9402,7 +9433,9 @@ static int __ixgbe_setup_tc(struct net_device *dev, enum 
tc_setup_type type,
 {
switch (type) {
case TC_SETUP_CLSU32:
-   return ixgbe_setup_tc_cls_u32(dev, type_data);
+   return 0; /* will be removed after conversion from ndo */
+   case TC_SETUP_BLOCK:
+   return ixgbe_setup_tc_block(dev, type_data);
case TC_SETUP_MQPRIO:
return ixgbe_setup_tc_mqprio(dev, type_data);
default:
-- 
2.9.5

Re: [PATCH net-next 3/3] ipv6: obsolete cached dst when removing them from fib tree

2017-10-17 Thread Paolo Abeni

On Tue, 2017-10-17 at 11:58 -0700, Wei Wang wrote:
> On Tue, Oct 17, 2017 at 10:40 AM, Paolo Abeni  wrote:
> > The commit 2b760fcf5cfb ("ipv6: hook up exception table to store
> > dst cache") partially reverted 1e2ea8ad37be ("ipv6: set
> > dst.obsolete when a cached route has expired").
> > 
> > This change brings back the dst obsoleting and push it a step
> > farther: cached dst are always obsoleted when removed from the
> > fib tree, and removal by time expiration is now performed
> > regardless of dst->__refcnt, to be consistent with what we
> > already do for RTF_GATEWAY dst.
> > 
> > Fixes: 2b760fcf5cfb ("ipv6: hook up exception table to store dst cache")
> > Signed-off-by: Paolo Abeni 
> > ---
> >  net/ipv6/route.c | 13 +++--
> >  1 file changed, 11 insertions(+), 2 deletions(-)
> > 
> > diff --git a/net/ipv6/route.c b/net/ipv6/route.c
> > index 8b25a31b6b03..fce740049e3e 100644
> > --- a/net/ipv6/route.c
> > +++ b/net/ipv6/route.c
> > @@ -1147,6 +1147,12 @@ static void rt6_remove_exception(struct 
> > rt6_exception_bucket *bucket,
> > if (!bucket || !rt6_ex)
> > return;
> > 
> > +   /* sockets, flow cache, etc. can hold a refence to this dst, be sure
> > +* they will drop it.
> > +*/
> > +   if (rt6_ex->rt6i)
> > +   rt6_ex->rt6i->dst.obsolete = DST_OBSOLETE_FORCE_CHK;
> > +
> 
> Hmm... I don't really think it is needed. rt6 is created with
> rt6->dst.obsolete set to DST_OBSOLETE_FORCE_CHK. And by the time the
> above function is called, it should still be that value.
> Furthermore, the later call rt6_release() calls dst_dev_put() which
> sets rt6->dst.obsolete to DST_OBSOLETE_DEAD to indicate this route has
> been removed from the tree.

You are right, this looks as not needed, if we keep the chunck below.

> > net = dev_net(rt6_ex->rt6i->dst.dev);
> > rt6_ex->rt6i->rt6i_node = NULL;
> > hlist_del_rcu(_ex->hlist);
> > @@ -1575,8 +1581,11 @@ static void rt6_age_examine_exception(struct 
> > rt6_exception_bucket *bucket,
> >  {
> > struct rt6_info *rt = rt6_ex->rt6i;
> > 
> > -   if (atomic_read(>dst.__refcnt) == 1 &&
> > -   time_after_eq(now, rt->dst.lastuse + gc_args->timeout)) {
> > +   /* we are pruning and obsoleting the exception route even if others
> > +* have still reference to it, so that on next dst_check() such
> > +* reference can be dropped
> > +*/
> > +   if (time_after_eq(now, rt->dst.lastuse + gc_args->timeout)) {
> 
> Why do we want to change this behavior? Before my patch series, cached
> routes were only deleted from the tree in fib6_age() when
> rt->dst.__refcnt == 1, isn't it?

yes, but that really looks like a relic from ancient past more than
something really needed. We already remove from the dst from fib tree
regardless of the refcnt if the gateway validation fails - a few lines
below in the same function.

Waiting for __refcnt going down will let the kernel keep the exception
entry around for much longer - potentially forever, if e.g. we have a
reference in a socket dst cache and the application stops processing
packets. 

Meanwhile others sockets may grab more references to (and use) the same
aged-out dst.

The commit 1e2ea8ad37be ("ipv6: set dst.obsolete when a cached route
has expired") was the solution to the above issue prior to the recent
refactor.

Cheers,

Paolo

[PATCH v2 net] dccp/tcp: fix ireq->opt races

2017-10-17 Thread Eric Dumazet

From: Eric Dumazet 

syzkaller found another bug in DCCP/TCP stacks [1]

For the reasons explained in commit ce1050089c96 ("tcp/dccp: fix
ireq->pktopts race"), we need to make sure we do not access
ireq->opt unless we own the request sock.

[1]
BUG: KASAN: use-after-free in ip_queue_xmit+0x1687/0x18e0 
net/ipv4/ip_output.c:474
Read of size 1 at addr 8801c951039c by task syz-executor5/3295

CPU: 1 PID: 3295 Comm: syz-executor5 Not tainted 4.14.0-rc4+ #80
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 
01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:16 [inline]
 dump_stack+0x194/0x257 lib/dump_stack.c:52
 print_address_description+0x73/0x250 mm/kasan/report.c:252
 kasan_report_error mm/kasan/report.c:351 [inline]
 kasan_report+0x25b/0x340 mm/kasan/report.c:409
 __asan_report_load1_noabort+0x14/0x20 mm/kasan/report.c:427
 ip_queue_xmit+0x1687/0x18e0 net/ipv4/ip_output.c:474
 tcp_transmit_skb+0x1ab7/0x3840 net/ipv4/tcp_output.c:1135
 tcp_send_ack.part.37+0x3bb/0x650 net/ipv4/tcp_output.c:3587
 tcp_send_ack+0x49/0x60 net/ipv4/tcp_output.c:3557
 __tcp_ack_snd_check+0x2c6/0x4b0 net/ipv4/tcp_input.c:5072
 tcp_ack_snd_check net/ipv4/tcp_input.c:5085 [inline]
 tcp_rcv_state_process+0x2eff/0x4850 net/ipv4/tcp_input.c:6071
 tcp_child_process+0x342/0x990 net/ipv4/tcp_minisocks.c:816
 tcp_v4_rcv+0x1827/0x2f80 net/ipv4/tcp_ipv4.c:1682
 ip_local_deliver_finish+0x2e2/0xba0 net/ipv4/ip_input.c:216
 NF_HOOK include/linux/netfilter.h:249 [inline]
 ip_local_deliver+0x1ce/0x6e0 net/ipv4/ip_input.c:257
 dst_input include/net/dst.h:464 [inline]
 ip_rcv_finish+0x887/0x19a0 net/ipv4/ip_input.c:397
 NF_HOOK include/linux/netfilter.h:249 [inline]
 ip_rcv+0xc3f/0x1820 net/ipv4/ip_input.c:493
 __netif_receive_skb_core+0x1a3e/0x34b0 net/core/dev.c:4476
 __netif_receive_skb+0x2c/0x1b0 net/core/dev.c:4514
 netif_receive_skb_internal+0x10b/0x670 net/core/dev.c:4587
 netif_receive_skb+0xae/0x390 net/core/dev.c:4611
 tun_rx_batched.isra.50+0x5ed/0x860 drivers/net/tun.c:1372
 tun_get_user+0x249c/0x36d0 drivers/net/tun.c:1766
 tun_chr_write_iter+0xbf/0x160 drivers/net/tun.c:1792
 call_write_iter include/linux/fs.h:1770 [inline]
 new_sync_write fs/read_write.c:468 [inline]
 __vfs_write+0x68a/0x970 fs/read_write.c:481
 vfs_write+0x18f/0x510 fs/read_write.c:543
 SYSC_write fs/read_write.c:588 [inline]
 SyS_write+0xef/0x220 fs/read_write.c:580
 entry_SYSCALL_64_fastpath+0x1f/0xbe
RIP: 0033:0x40c341
RSP: 002b:7f469523ec10 EFLAGS: 0293 ORIG_RAX: 0001
RAX: ffda RBX: 00718000 RCX: 0040c341
RDX: 0037 RSI: 20004000 RDI: 0015
RBP: 0086 R08:  R09: 
R10: 000f4240 R11: 0293 R12: 004b7fd1
R13:  R14: 2000 R15: 00025000

Allocated by task 3295:
 save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
 save_stack+0x43/0xd0 mm/kasan/kasan.c:447
 set_track mm/kasan/kasan.c:459 [inline]
 kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:551
 __do_kmalloc mm/slab.c:3725 [inline]
 __kmalloc+0x162/0x760 mm/slab.c:3734
 kmalloc include/linux/slab.h:498 [inline]
 tcp_v4_save_options include/net/tcp.h:1962 [inline]
 tcp_v4_init_req+0x2d3/0x3e0 net/ipv4/tcp_ipv4.c:1271
 tcp_conn_request+0xf6d/0x3410 net/ipv4/tcp_input.c:6283
 tcp_v4_conn_request+0x157/0x210 net/ipv4/tcp_ipv4.c:1313
 tcp_rcv_state_process+0x8ea/0x4850 net/ipv4/tcp_input.c:5857
 tcp_v4_do_rcv+0x55c/0x7d0 net/ipv4/tcp_ipv4.c:1482
 tcp_v4_rcv+0x2d10/0x2f80 net/ipv4/tcp_ipv4.c:1711
 ip_local_deliver_finish+0x2e2/0xba0 net/ipv4/ip_input.c:216
 NF_HOOK include/linux/netfilter.h:249 [inline]
 ip_local_deliver+0x1ce/0x6e0 net/ipv4/ip_input.c:257
 dst_input include/net/dst.h:464 [inline]
 ip_rcv_finish+0x887/0x19a0 net/ipv4/ip_input.c:397
 NF_HOOK include/linux/netfilter.h:249 [inline]
 ip_rcv+0xc3f/0x1820 net/ipv4/ip_input.c:493
 __netif_receive_skb_core+0x1a3e/0x34b0 net/core/dev.c:4476
 __netif_receive_skb+0x2c/0x1b0 net/core/dev.c:4514
 netif_receive_skb_internal+0x10b/0x670 net/core/dev.c:4587
 netif_receive_skb+0xae/0x390 net/core/dev.c:4611
 tun_rx_batched.isra.50+0x5ed/0x860 drivers/net/tun.c:1372
 tun_get_user+0x249c/0x36d0 drivers/net/tun.c:1766
 tun_chr_write_iter+0xbf/0x160 drivers/net/tun.c:1792
 call_write_iter include/linux/fs.h:1770 [inline]
 new_sync_write fs/read_write.c:468 [inline]
 __vfs_write+0x68a/0x970 fs/read_write.c:481
 vfs_write+0x18f/0x510 fs/read_write.c:543
 SYSC_write fs/read_write.c:588 [inline]
 SyS_write+0xef/0x220 fs/read_write.c:580
 entry_SYSCALL_64_fastpath+0x1f/0xbe

Freed by task 3306:
 save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
 save_stack+0x43/0xd0 mm/kasan/kasan.c:447
 set_track mm/kasan/kasan.c:459 [inline]
 kasan_slab_free+0x71/0xc0 mm/kasan/kasan.c:524
 __cache_free mm/slab.c:3503 [inline]
 kfree+0xca/0x250 mm/slab.c:3820
 inet_sock_destruct+0x59d/0x950 net/ipv4/af_inet.c:157
 __sk_destruct+0xfd/0x910

Re: [PATCH net] dccp/tcp: fix ireq->opt races

2017-10-17 Thread Eric Dumazet

On Tue, 2017-10-17 at 12:50 -0700, Eric Dumazet wrote:
> From: Eric Dumazet 
> 
> syzkaller found another bug in DCCP/TCP stacks [1]
> 
> For the reasons explained in commit ce1050089c96 ("tcp/dccp: fix
> ireq->pktopts race"), we need to make sure we do not access
> ireq->opt unless we own the request sock.


Arg, I will send a v2 without the silly lines that confuse patchwork.

[PATCH net] dccp/tcp: fix ireq->opt races

2017-10-17 Thread Eric Dumazet

From: Eric Dumazet 

syzkaller found another bug in DCCP/TCP stacks [1]

For the reasons explained in commit ce1050089c96 ("tcp/dccp: fix
ireq->pktopts race"), we need to make sure we do not access
ireq->opt unless we own the request sock.

[1]
BUG: KASAN: use-after-free in ip_queue_xmit+0x1687/0x18e0 
net/ipv4/ip_output.c:474
Read of size 1 at addr 8801c951039c by task syz-executor5/3295

CPU: 1 PID: 3295 Comm: syz-executor5 Not tainted 4.14.0-rc4+ #80
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 
01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:16 [inline]
 dump_stack+0x194/0x257 lib/dump_stack.c:52
 print_address_description+0x73/0x250 mm/kasan/report.c:252
 kasan_report_error mm/kasan/report.c:351 [inline]
 kasan_report+0x25b/0x340 mm/kasan/report.c:409
 __asan_report_load1_noabort+0x14/0x20 mm/kasan/report.c:427
 ip_queue_xmit+0x1687/0x18e0 net/ipv4/ip_output.c:474
 tcp_transmit_skb+0x1ab7/0x3840 net/ipv4/tcp_output.c:1135
 tcp_send_ack.part.37+0x3bb/0x650 net/ipv4/tcp_output.c:3587
 tcp_send_ack+0x49/0x60 net/ipv4/tcp_output.c:3557
 __tcp_ack_snd_check+0x2c6/0x4b0 net/ipv4/tcp_input.c:5072
 tcp_ack_snd_check net/ipv4/tcp_input.c:5085 [inline]
 tcp_rcv_state_process+0x2eff/0x4850 net/ipv4/tcp_input.c:6071
 tcp_child_process+0x342/0x990 net/ipv4/tcp_minisocks.c:816
 tcp_v4_rcv+0x1827/0x2f80 net/ipv4/tcp_ipv4.c:1682
 ip_local_deliver_finish+0x2e2/0xba0 net/ipv4/ip_input.c:216
 NF_HOOK include/linux/netfilter.h:249 [inline]
 ip_local_deliver+0x1ce/0x6e0 net/ipv4/ip_input.c:257
 dst_input include/net/dst.h:464 [inline]
 ip_rcv_finish+0x887/0x19a0 net/ipv4/ip_input.c:397
 NF_HOOK include/linux/netfilter.h:249 [inline]
 ip_rcv+0xc3f/0x1820 net/ipv4/ip_input.c:493
 __netif_receive_skb_core+0x1a3e/0x34b0 net/core/dev.c:4476
 __netif_receive_skb+0x2c/0x1b0 net/core/dev.c:4514
 netif_receive_skb_internal+0x10b/0x670 net/core/dev.c:4587
 netif_receive_skb+0xae/0x390 net/core/dev.c:4611
 tun_rx_batched.isra.50+0x5ed/0x860 drivers/net/tun.c:1372
 tun_get_user+0x249c/0x36d0 drivers/net/tun.c:1766
 tun_chr_write_iter+0xbf/0x160 drivers/net/tun.c:1792
 call_write_iter include/linux/fs.h:1770 [inline]
 new_sync_write fs/read_write.c:468 [inline]
 __vfs_write+0x68a/0x970 fs/read_write.c:481
 vfs_write+0x18f/0x510 fs/read_write.c:543
 SYSC_write fs/read_write.c:588 [inline]
 SyS_write+0xef/0x220 fs/read_write.c:580
 entry_SYSCALL_64_fastpath+0x1f/0xbe
RIP: 0033:0x40c341
RSP: 002b:7f469523ec10 EFLAGS: 0293 ORIG_RAX: 0001
RAX: ffda RBX: 00718000 RCX: 0040c341
RDX: 0037 RSI: 20004000 RDI: 0015
RBP: 0086 R08:  R09: 
R10: 000f4240 R11: 0293 R12: 004b7fd1
R13:  R14: 2000 R15: 00025000

Allocated by task 3295:
 save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
 save_stack+0x43/0xd0 mm/kasan/kasan.c:447
 set_track mm/kasan/kasan.c:459 [inline]
 kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:551
 __do_kmalloc mm/slab.c:3725 [inline]
 __kmalloc+0x162/0x760 mm/slab.c:3734
 kmalloc include/linux/slab.h:498 [inline]
 tcp_v4_save_options include/net/tcp.h:1962 [inline]
 tcp_v4_init_req+0x2d3/0x3e0 net/ipv4/tcp_ipv4.c:1271
 tcp_conn_request+0xf6d/0x3410 net/ipv4/tcp_input.c:6283
 tcp_v4_conn_request+0x157/0x210 net/ipv4/tcp_ipv4.c:1313
 tcp_rcv_state_process+0x8ea/0x4850 net/ipv4/tcp_input.c:5857
 tcp_v4_do_rcv+0x55c/0x7d0 net/ipv4/tcp_ipv4.c:1482
 tcp_v4_rcv+0x2d10/0x2f80 net/ipv4/tcp_ipv4.c:1711
 ip_local_deliver_finish+0x2e2/0xba0 net/ipv4/ip_input.c:216
 NF_HOOK include/linux/netfilter.h:249 [inline]
 ip_local_deliver+0x1ce/0x6e0 net/ipv4/ip_input.c:257
 dst_input include/net/dst.h:464 [inline]
 ip_rcv_finish+0x887/0x19a0 net/ipv4/ip_input.c:397
 NF_HOOK include/linux/netfilter.h:249 [inline]
 ip_rcv+0xc3f/0x1820 net/ipv4/ip_input.c:493
 __netif_receive_skb_core+0x1a3e/0x34b0 net/core/dev.c:4476
 __netif_receive_skb+0x2c/0x1b0 net/core/dev.c:4514
 netif_receive_skb_internal+0x10b/0x670 net/core/dev.c:4587
 netif_receive_skb+0xae/0x390 net/core/dev.c:4611
 tun_rx_batched.isra.50+0x5ed/0x860 drivers/net/tun.c:1372
 tun_get_user+0x249c/0x36d0 drivers/net/tun.c:1766
 tun_chr_write_iter+0xbf/0x160 drivers/net/tun.c:1792
 call_write_iter include/linux/fs.h:1770 [inline]
 new_sync_write fs/read_write.c:468 [inline]
 __vfs_write+0x68a/0x970 fs/read_write.c:481
 vfs_write+0x18f/0x510 fs/read_write.c:543
 SYSC_write fs/read_write.c:588 [inline]
 SyS_write+0xef/0x220 fs/read_write.c:580
 entry_SYSCALL_64_fastpath+0x1f/0xbe

Freed by task 3306:
 save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
 save_stack+0x43/0xd0 mm/kasan/kasan.c:447
 set_track mm/kasan/kasan.c:459 [inline]
 kasan_slab_free+0x71/0xc0 mm/kasan/kasan.c:524
 __cache_free mm/slab.c:3503 [inline]
 kfree+0xca/0x250 mm/slab.c:3820
 inet_sock_destruct+0x59d/0x950 net/ipv4/af_inet.c:157
 __sk_destruct+0xfd/0x910

Re: [PATCH 00/58] networking: Convert timers to use timer_setup()

2017-10-17 Thread Kees Cook

On Tue, Oct 17, 2017 at 7:18 AM, Kalle Valo  wrote:
> + linux-wireless
>
> Hi Kees,
>
> Kees Cook  writes:
>
>> This is the current set of outstanding networking patches to perform
>> conversions to the new timer interface (rebased to -next). This is not
>> all expected conversions, but it contains everything needed in networking
>> to eliminate init_timer(), and all the non-standard setup_*_timer() uses.
>
> So this also includes patches which I had queued for
> wireless-drivers-next:
>
> https://patchwork.kernel.org/patch/9986253/
> https://patchwork.kernel.org/patch/9986245/
>
> And looking at patchwork[1] I have even more timer_setup() related
> patches from you. It would be really helpful if you could clearly
> document to which tree you want the patches to be applied. I don't care

Hi! Sorry about that. It's been a bit tricky to juggle everything.

> if it's net-next or wireless-drivers-next as long as it's not the both
> (meaning that both Dave and me apply the same patch, which would be
> bad). The thing is that I really do not have time to figure out for
> every patch via which tree it's supposed to go.

Which split is preferred? I had been trying to separate wireless from
the rest of net (but missed some cases).

> For now I'll just drop all your timer_setup() related patches from my
> queue and I'll assume Dave will take those. Ok?
>
> [1] https://patchwork.kernel.org/project/linux-wireless/list/

I guess I'll wait to see what Dave says.

Thanks!

-Kees

-- 
Kees Cook
Pixel Security

[PATCH net] dccp/tcp: fix ireq->opt races

2017-10-17 Thread Eric Dumazet

From: Eric Dumazet 

syzkaller found another bug in DCCP/TCP stacks [1]

For the reasons explained in commit ce1050089c96 ("tcp/dccp: fix
ireq->pktopts race"), we need to make sure we do not access
ireq->opt unless we own the request sock.

[1]
BUG: KASAN: use-after-free in ip_queue_xmit+0x1687/0x18e0 
net/ipv4/ip_output.c:474
Read of size 1 at addr 8801c951039c by task syz-executor5/3295

CPU: 1 PID: 3295 Comm: syz-executor5 Not tainted 4.14.0-rc4+ #80
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 
01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:16 [inline]
 dump_stack+0x194/0x257 lib/dump_stack.c:52
 print_address_description+0x73/0x250 mm/kasan/report.c:252
 kasan_report_error mm/kasan/report.c:351 [inline]
 kasan_report+0x25b/0x340 mm/kasan/report.c:409
 __asan_report_load1_noabort+0x14/0x20 mm/kasan/report.c:427
 ip_queue_xmit+0x1687/0x18e0 net/ipv4/ip_output.c:474
 tcp_transmit_skb+0x1ab7/0x3840 net/ipv4/tcp_output.c:1135
 tcp_send_ack.part.37+0x3bb/0x650 net/ipv4/tcp_output.c:3587
 tcp_send_ack+0x49/0x60 net/ipv4/tcp_output.c:3557
 __tcp_ack_snd_check+0x2c6/0x4b0 net/ipv4/tcp_input.c:5072
 tcp_ack_snd_check net/ipv4/tcp_input.c:5085 [inline]
 tcp_rcv_state_process+0x2eff/0x4850 net/ipv4/tcp_input.c:6071
 tcp_child_process+0x342/0x990 net/ipv4/tcp_minisocks.c:816
 tcp_v4_rcv+0x1827/0x2f80 net/ipv4/tcp_ipv4.c:1682
 ip_local_deliver_finish+0x2e2/0xba0 net/ipv4/ip_input.c:216
 NF_HOOK include/linux/netfilter.h:249 [inline]
 ip_local_deliver+0x1ce/0x6e0 net/ipv4/ip_input.c:257
 dst_input include/net/dst.h:464 [inline]
 ip_rcv_finish+0x887/0x19a0 net/ipv4/ip_input.c:397
 NF_HOOK include/linux/netfilter.h:249 [inline]
 ip_rcv+0xc3f/0x1820 net/ipv4/ip_input.c:493
 __netif_receive_skb_core+0x1a3e/0x34b0 net/core/dev.c:4476
 __netif_receive_skb+0x2c/0x1b0 net/core/dev.c:4514
 netif_receive_skb_internal+0x10b/0x670 net/core/dev.c:4587
 netif_receive_skb+0xae/0x390 net/core/dev.c:4611
 tun_rx_batched.isra.50+0x5ed/0x860 drivers/net/tun.c:1372
 tun_get_user+0x249c/0x36d0 drivers/net/tun.c:1766
 tun_chr_write_iter+0xbf/0x160 drivers/net/tun.c:1792
 call_write_iter include/linux/fs.h:1770 [inline]
 new_sync_write fs/read_write.c:468 [inline]
 __vfs_write+0x68a/0x970 fs/read_write.c:481
 vfs_write+0x18f/0x510 fs/read_write.c:543
 SYSC_write fs/read_write.c:588 [inline]
 SyS_write+0xef/0x220 fs/read_write.c:580
 entry_SYSCALL_64_fastpath+0x1f/0xbe
RIP: 0033:0x40c341
RSP: 002b:7f469523ec10 EFLAGS: 0293 ORIG_RAX: 0001
RAX: ffda RBX: 00718000 RCX: 0040c341
RDX: 0037 RSI: 20004000 RDI: 0015
RBP: 0086 R08:  R09: 
R10: 000f4240 R11: 0293 R12: 004b7fd1
R13:  R14: 2000 R15: 00025000

Allocated by task 3295:
 save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
 save_stack+0x43/0xd0 mm/kasan/kasan.c:447
 set_track mm/kasan/kasan.c:459 [inline]
 kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:551
 __do_kmalloc mm/slab.c:3725 [inline]
 __kmalloc+0x162/0x760 mm/slab.c:3734
 kmalloc include/linux/slab.h:498 [inline]
 tcp_v4_save_options include/net/tcp.h:1962 [inline]
 tcp_v4_init_req+0x2d3/0x3e0 net/ipv4/tcp_ipv4.c:1271
 tcp_conn_request+0xf6d/0x3410 net/ipv4/tcp_input.c:6283
 tcp_v4_conn_request+0x157/0x210 net/ipv4/tcp_ipv4.c:1313
 tcp_rcv_state_process+0x8ea/0x4850 net/ipv4/tcp_input.c:5857
 tcp_v4_do_rcv+0x55c/0x7d0 net/ipv4/tcp_ipv4.c:1482
 tcp_v4_rcv+0x2d10/0x2f80 net/ipv4/tcp_ipv4.c:1711
 ip_local_deliver_finish+0x2e2/0xba0 net/ipv4/ip_input.c:216
 NF_HOOK include/linux/netfilter.h:249 [inline]
 ip_local_deliver+0x1ce/0x6e0 net/ipv4/ip_input.c:257
 dst_input include/net/dst.h:464 [inline]
 ip_rcv_finish+0x887/0x19a0 net/ipv4/ip_input.c:397
 NF_HOOK include/linux/netfilter.h:249 [inline]
 ip_rcv+0xc3f/0x1820 net/ipv4/ip_input.c:493
 __netif_receive_skb_core+0x1a3e/0x34b0 net/core/dev.c:4476
 __netif_receive_skb+0x2c/0x1b0 net/core/dev.c:4514
 netif_receive_skb_internal+0x10b/0x670 net/core/dev.c:4587
 netif_receive_skb+0xae/0x390 net/core/dev.c:4611
 tun_rx_batched.isra.50+0x5ed/0x860 drivers/net/tun.c:1372
 tun_get_user+0x249c/0x36d0 drivers/net/tun.c:1766
 tun_chr_write_iter+0xbf/0x160 drivers/net/tun.c:1792
 call_write_iter include/linux/fs.h:1770 [inline]
 new_sync_write fs/read_write.c:468 [inline]
 __vfs_write+0x68a/0x970 fs/read_write.c:481
 vfs_write+0x18f/0x510 fs/read_write.c:543
 SYSC_write fs/read_write.c:588 [inline]
 SyS_write+0xef/0x220 fs/read_write.c:580
 entry_SYSCALL_64_fastpath+0x1f/0xbe

Freed by task 3306:
 save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
 save_stack+0x43/0xd0 mm/kasan/kasan.c:447
 set_track mm/kasan/kasan.c:459 [inline]
 kasan_slab_free+0x71/0xc0 mm/kasan/kasan.c:524
 __cache_free mm/slab.c:3503 [inline]
 kfree+0xca/0x250 mm/slab.c:3820
 inet_sock_destruct+0x59d/0x950 net/ipv4/af_inet.c:157
 __sk_destruct+0xfd/0x910

Re: [PATCH net-next 1/3] ipv6: fix route cache dump

2017-10-17 Thread Paolo Abeni

Hi,

On Tue, 2017-10-17 at 11:41 -0700, Eric Dumazet wrote:
> On Tue, Oct 17, 2017 at 11:26 AM, Wei Wang  wrote:
> > On Tue, Oct 17, 2017 at 10:40 AM, Paolo Abeni  wrote:
> > > After the commit 2b760fcf5cfb ("ipv6: hook up exception table to
> > > store dst cache"), entries in the routing cache are not shown by:
> > > 
> > > ip route show cache
> > 
> > Hi Paolo,
> > 
> > Thanks for doing this.
> > But I think your patch does not take care of the case where there are
> > a lot of cached routes in the exception table and 1 skb is just not
> > enough to dump the main route + all cached routes in the exception
> > table.
> > In this case, your patch will keep dumping the same main route.
> > 
> > I think some logic needs to be incorporated into the fib6_walk() so
> > that it can also remember the last dumped cached route if necessary in
> > the exception table and start from there for the next dump.
> > I do have a patch for that and that patch tries to keep a linked list
> > of all cached routes from the exception table in the walker struct and
> > remove any routes that are already dumped.
> > It is a bit complicated and might not be the best solution. And as
> > IPv4 already does not support dumping cached routes, I did not send
> > that out in the previous patch series.

Thanks for the feedback.

You are right, I was too hasty with this.

> Yes, since we no longer dump IPV4 cached routes, I doubt anyone
> depends on IPv6 cached routes, but not on IPv4 ones.
> 
> Paolo, do you have a concrete use case for this ?

I have a testing script looking for that, but I guess I can adapt it.

I'm fine with dropping cached routes dumping support if there is
agreement on that.

I haven't understood that such change was intentional.

Cheers,

Paolo

[PATCH 2/2] liquidio: mark expected switch fall-through in octeon_destroy_resources

2017-10-17 Thread Gustavo A. R. Silva

In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

Signed-off-by: Gustavo A. R. Silva 
---
This code was tested by compilation only (GCC 7.2.0 was used).
Please, verify if the actual intention of the code is to fall through.

 drivers/net/ethernet/cavium/liquidio/lio_vf_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c 
b/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c
index e4a112c..4c3b568 100644
--- a/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c
+++ b/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c
@@ -747,7 +747,7 @@ static void octeon_destroy_resources(struct octeon_device 
*oct)
 
if (lio_wait_for_oq_pkts(oct))
dev_err(>pci_dev->dev, "OQ had pending packets\n");
-
+   /* fall through */
case OCT_DEV_INTR_SET_DONE:
/* Disable interrupts  */
oct->fn_list.disable_interrupt(oct, OCTEON_ALL_INTR);
-- 
2.7.4

[PATCH 1/2] liquidio: remove unnecessary NULL check before kfree in delete_glists

2017-10-17 Thread Gustavo A. R. Silva

NULL check before freeing functions like kfree is not needed.

This issue was detected with the help of Coccinelle.

Signed-off-by: Gustavo A. R. Silva 
---
This code was tested by compilation only.

 drivers/net/ethernet/cavium/liquidio/lio_vf_main.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c 
b/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c
index 2e993ce..e4a112c 100644
--- a/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c
+++ b/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c
@@ -435,8 +435,7 @@ static void delete_glists(struct lio *lio)
do {
g = (struct octnic_gather *)
list_delete_head(>glist[i]);
-   if (g)
-   kfree(g);
+   kfree(g);
} while (g);
 
if (lio->glists_virt_base && lio->glists_virt_base[i] &&
-- 
2.7.4

Re: [PATCH net-next 3/3] ipv6: obsolete cached dst when removing them from fib tree

2017-10-17 Thread Wei Wang

On Tue, Oct 17, 2017 at 10:40 AM, Paolo Abeni  wrote:
> The commit 2b760fcf5cfb ("ipv6: hook up exception table to store
> dst cache") partially reverted 1e2ea8ad37be ("ipv6: set
> dst.obsolete when a cached route has expired").
>
> This change brings back the dst obsoleting and push it a step
> farther: cached dst are always obsoleted when removed from the
> fib tree, and removal by time expiration is now performed
> regardless of dst->__refcnt, to be consistent with what we
> already do for RTF_GATEWAY dst.
>
> Fixes: 2b760fcf5cfb ("ipv6: hook up exception table to store dst cache")
> Signed-off-by: Paolo Abeni 
> ---
>  net/ipv6/route.c | 13 +++--
>  1 file changed, 11 insertions(+), 2 deletions(-)
>
> diff --git a/net/ipv6/route.c b/net/ipv6/route.c
> index 8b25a31b6b03..fce740049e3e 100644
> --- a/net/ipv6/route.c
> +++ b/net/ipv6/route.c
> @@ -1147,6 +1147,12 @@ static void rt6_remove_exception(struct 
> rt6_exception_bucket *bucket,
> if (!bucket || !rt6_ex)
> return;
>
> +   /* sockets, flow cache, etc. can hold a refence to this dst, be sure
> +* they will drop it.
> +*/
> +   if (rt6_ex->rt6i)
> +   rt6_ex->rt6i->dst.obsolete = DST_OBSOLETE_FORCE_CHK;
> +

Hmm... I don't really think it is needed. rt6 is created with
rt6->dst.obsolete set to DST_OBSOLETE_FORCE_CHK. And by the time the
above function is called, it should still be that value.
Furthermore, the later call rt6_release() calls dst_dev_put() which
sets rt6->dst.obsolete to DST_OBSOLETE_DEAD to indicate this route has
been removed from the tree.

> net = dev_net(rt6_ex->rt6i->dst.dev);
> rt6_ex->rt6i->rt6i_node = NULL;
> hlist_del_rcu(_ex->hlist);
> @@ -1575,8 +1581,11 @@ static void rt6_age_examine_exception(struct 
> rt6_exception_bucket *bucket,
>  {
> struct rt6_info *rt = rt6_ex->rt6i;
>
> -   if (atomic_read(>dst.__refcnt) == 1 &&
> -   time_after_eq(now, rt->dst.lastuse + gc_args->timeout)) {
> +   /* we are pruning and obsoleting the exception route even if others
> +* have still reference to it, so that on next dst_check() such
> +* reference can be dropped
> +*/
> +   if (time_after_eq(now, rt->dst.lastuse + gc_args->timeout)) {

Why do we want to change this behavior? Before my patch series, cached
routes were only deleted from the tree in fib6_age() when
rt->dst.__refcnt == 1, isn't it?

> RT6_TRACE("aging clone %p\n", rt);
> rt6_remove_exception(bucket, rt6_ex);
> return;
> --
> 2.13.6
>

Re: [PATCH net-next 1/3] ipv6: fix route cache dump

2017-10-17 Thread Eric Dumazet

On Tue, Oct 17, 2017 at 11:26 AM, Wei Wang  wrote:
> On Tue, Oct 17, 2017 at 10:40 AM, Paolo Abeni  wrote:
>> After the commit 2b760fcf5cfb ("ipv6: hook up exception table to
>> store dst cache"), entries in the routing cache are not shown by:
>>
>> ip route show cache
>
> Hi Paolo,
>
> Thanks for doing this.
> But I think your patch does not take care of the case where there are
> a lot of cached routes in the exception table and 1 skb is just not
> enough to dump the main route + all cached routes in the exception
> table.
> In this case, your patch will keep dumping the same main route.
>
> I think some logic needs to be incorporated into the fib6_walk() so
> that it can also remember the last dumped cached route if necessary in
> the exception table and start from there for the next dump.
> I do have a patch for that and that patch tries to keep a linked list
> of all cached routes from the exception table in the walker struct and
> remove any routes that are already dumped.
> It is a bit complicated and might not be the best solution. And as
> IPv4 already does not support dumping cached routes, I did not send
> that out in the previous patch series.

Yes, since we no longer dump IPV4 cached routes, I doubt anyone
depends on IPv6 cached routes, but not on IPv4 ones.

Paolo, do you have a concrete use case for this ?

Re: [PATCH net-next 2/3] ipv6: start fib6 gc on RTF_CACHE dst creation

2017-10-17 Thread Wei Wang

On Tue, Oct 17, 2017 at 10:40 AM, Paolo Abeni  wrote:
> After the commit Fixes: 2b760fcf5cfb ("ipv6: hook up exception
> table to store dst cache"), the fib6 gc is not started after
> the creation of a RTF_CACHE via a redirect or pmtu update, since
> fib6_add() isn't invoked anymore for such dsts.
>
> We need the fib6 gc to run periodically to clean the RTF_CACHE,
> or the dst will stay there forever.
>
> Fix it by explicitly calling fib6_force_start_gc() on successful
> exception creation. gc_args->more accounting will ensure that
> the gc timer will run for whatever time needed to properly
> clean the table.
>
> Fixes: 2b760fcf5cfb ("ipv6: hook up exception table to store dst cache")
> Signed-off-by: Paolo Abeni 
> ---
Acked-by: Wei Wang 

Totally true. Thanks for catching this.

>  net/ipv6/route.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/net/ipv6/route.c b/net/ipv6/route.c
> index 5bb53dbd4fd3..8b25a31b6b03 100644
> --- a/net/ipv6/route.c
> +++ b/net/ipv6/route.c
> @@ -1340,8 +1340,10 @@ static int rt6_insert_exception(struct rt6_info *nrt,
> spin_unlock_bh(_exception_lock);
>
> /* Update fn->fn_sernum to invalidate all cached dst */
> -   if (!err)
> +   if (!err) {
> fib6_update_sernum(ort);
> +   fib6_force_start_gc(net);
> +   }
>
> return err;
>  }
> --
> 2.13.6
>

[PATCH] mac80211: use constant time comparison with keys

2017-10-17 Thread Jason A. Donenfeld

Otherwise we risk leaking information via timing side channel.

Fixes: fdf7cb4185b6 ("mac80211: accept key reinstall without changing anything")
Signed-off-by: Jason A. Donenfeld 
---
 net/mac80211/key.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/mac80211/key.c b/net/mac80211/key.c
index ae995c8480db..035d16fe926e 100644
--- a/net/mac80211/key.c
+++ b/net/mac80211/key.c
@@ -19,6 +19,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include "ieee80211_i.h"
 #include "driver-ops.h"
@@ -635,7 +636,7 @@ int ieee80211_key_link(struct ieee80211_key *key,
 * new version of the key to avoid nonce reuse or replay issues.
 */
if (old_key && key->conf.keylen == old_key->conf.keylen &&
-   !memcmp(key->conf.key, old_key->conf.key, key->conf.keylen)) {
+   !crypto_memneq(key->conf.key, old_key->conf.key, key->conf.keylen)) 
{
ieee80211_key_free_unused(key);
ret = 0;
goto out;
-- 
2.14.2

[net-next 04/15] i40e: fix clearing link masks in i40e_get_link_ksettings

2017-10-17 Thread Jeff Kirsher

From: Alan Brady 

This fixes two issues in i40e_get_link_ksettings.  It adds calls to
ethtool_link_ksettings_zero_link_mode to make sure advertising and
supported link masks are cleared before we start setting bits in them.

This also replaces some funky bit manipulations with a much nicer call
to ethtool_link_ksettings_del_link_mode when removing link modes.

Signed-off-by: Alan Brady 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_ethtool.c | 14 ++
 1 file changed, 6 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c 
b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
index f4a70ef3f2e0..fe0b2327de5b 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
@@ -600,7 +600,9 @@ static int i40e_get_link_ksettings(struct net_device 
*netdev,
struct i40e_hw *hw = >hw;
struct i40e_link_status *hw_link_info = >phy.link_info;
bool link_up = hw_link_info->link_info & I40E_AQ_LINK_UP;
-   u32 advertising;
+
+   ethtool_link_ksettings_zero_link_mode(ks, supported);
+   ethtool_link_ksettings_zero_link_mode(ks, advertising);
 
if (link_up)
i40e_get_settings_link_up(hw, ks, netdev, pf);
@@ -664,13 +666,9 @@ static int i40e_get_link_ksettings(struct net_device 
*netdev,
 Asym_Pause);
break;
default:
-   ethtool_convert_link_mode_to_legacy_u32(
-   , ks->link_modes.advertising);
-
-   advertising &= ~(ADVERTISED_Pause | ADVERTISED_Asym_Pause);
-
-   ethtool_convert_legacy_u32_to_link_mode(
-   ks->link_modes.advertising, advertising);
+   ethtool_link_ksettings_del_link_mode(ks, advertising, Pause);
+   ethtool_link_ksettings_del_link_mode(ks, advertising,
+Asym_Pause);
break;
}
 
-- 
2.14.2

[net-next 00/15][pull request] 40GbE Intel Wired LAN Driver Updates 2017-10-17

2017-10-17 Thread Jeff Kirsher

This series contains updates to i40e and ethtool.

Alan provides most of the changes in this series which are mainly fixes
and cleanups.  Renamed the ethtool "cmd" variable to "ks", since the new
ethtool API passes us ksettings structs instead of command structs.
Cleaned up an ifdef that was not accomplishing anything.  Added function
header comments to provide better documentation.  Fixed two issues in
i40e_get_link_ksettings(), by calling
ethtool_link_ksettings_zero_link_mode() to ensure the advertising and
link masks are cleared before we start setting bits.  Cleaned up and fixed
code comments which were incorrect.  Separated the setting of autoneg in
i40e_phy_types_to_ethtool() into its own conditional to clarify what PHYs
support and advertise autoneg, and makes it easier to add new PHY types in
the future.  Added ethtool functionality to intersect two link masks
together to find the common ground between them.  Overhauled i40e to
ensure that the new ethtool API macros are being used, instead of the
old ones.  Fixed the usage of unsigned 64-bit division which is not
supported on all architectures.

Sudheer adds support for 25G Active Optical Cables (AOC) and Active Copper
Cables (ACC) PHY types.

The following are changes since commit 8a5f2166a6288ee4b5a393f1ebc8cfb26b0510f0:
  net: export netdev_txq_to_tc to allow sch_mqprio to compile as module
and are available in the git repository at:
  git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue 40GbE

Alan Brady (14):
  i40e: rename 'cmd' variables in ethtool interface
  i40e: remove ifdef SPEED_25000
  i40e: add function header for i40e_get_rxfh
  i40e: fix clearing link masks in i40e_get_link_ksettings
  i40e: fix i40e_phy_type_to_ethtool function header
  i40e: fix comment typo
  i40e: fix whitespace issues in i40e_ethtool.c
  i40e: group autoneg PHY types together
  ethtool: add ethtool_intersect_link_masks
  i40e: convert i40e_phy_type_to_ethtool to new API
  i40e: convert i40e_get_settings_link_up to new API
  i40e: rename 'change' variable to 'autoneg_changed'
  i40e: convert i40e_set_link_ksettings to new API
  i40e: fix u64 division usage

Sudheer Mogilappagari (1):
  i40e: Add new PHY types for 25G AOC and ACC support

 drivers/net/ethernet/intel/i40e/i40e.h |   3 +-
 drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h  |   4 +
 drivers/net/ethernet/intel/i40e/i40e_common.c  |   2 +
 drivers/net/ethernet/intel/i40e/i40e_ethtool.c | 721 +
 drivers/net/ethernet/intel/i40e/i40e_main.c|  58 +-
 drivers/net/ethernet/intel/i40e/i40e_type.h|   4 +
 .../net/ethernet/intel/i40evf/i40e_adminq_cmd.h|   4 +
 include/linux/ethtool.h|  10 +
 net/core/ethtool.c |  16 +
 9 files changed, 521 insertions(+), 301 deletions(-)

-- 
2.14.2

[net-next 06/15] i40e: fix comment typo

2017-10-17 Thread Jeff Kirsher

From: Alan Brady 

Someone forgot a word in this comment and it's confusing without it.

Signed-off-by: Alan Brady 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_ethtool.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c 
b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
index a137675c1426..e40fb559dacb 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
@@ -516,8 +516,8 @@ static void i40e_get_settings_link_up(struct i40e_hw *hw,
}
 
/* Now that we've worked out everything that could be supported by the
-* current PHY type, get what is supported by the NVM and them to
-* get what is truly supported
+* current PHY type, get what is supported by the NVM and intersect
+* them to get what is truly supported
 */
i40e_phy_type_to_ethtool(pf, _supported,
 _advertising);
-- 
2.14.2

[net-next 05/15] i40e: fix i40e_phy_type_to_ethtool function header

2017-10-17 Thread Jeff Kirsher

From: Alan Brady 

The function header erroneously listed 'phy_types' as a parameter.  The
correct parameter is 'pf'.

Signed-off-by: Alan Brady 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_ethtool.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c 
b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
index fe0b2327de5b..a137675c1426 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
@@ -253,7 +253,7 @@ static void i40e_partition_setting_complaint(struct i40e_pf 
*pf)
 
 /**
  * i40e_phy_type_to_ethtool - convert the phy_types to ethtool link modes
- * @phy_types: PHY types to convert
+ * @pf: PF struct with phy_types
  * @supported: pointer to the ethtool supported variable to fill in
  * @advertising: pointer to the ethtool advertising variable to fill in
  *
-- 
2.14.2

[net-next 08/15] i40e: group autoneg PHY types together

2017-10-17 Thread Jeff Kirsher

From: Alan Brady 

This separates the setting of autoneg in i40e_phy_types_to_ethtool into
its own conditional.  Doing this adds clarity as what PHYs
support/advertise autoneg and makes it easier to add new PHY types in
the future.

This also fixes an issue on devices with CRT_RETIMER where advertising
autoneg was being set, but supported autoneg was not.

Signed-off-by: Alan Brady 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_ethtool.c | 85 +-
 1 file changed, 41 insertions(+), 44 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c 
b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
index 89ab398a7d30..30deae77e745 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
@@ -268,9 +268,7 @@ static void i40e_phy_type_to_ethtool(struct i40e_pf *pf, 
u32 *supported,
*advertising = 0x0;
 
if (phy_types & I40E_CAP_PHY_TYPE_SGMII) {
-   *supported |= SUPPORTED_Autoneg |
- SUPPORTED_1000baseT_Full;
-   *advertising |= ADVERTISED_Autoneg;
+   *supported |= SUPPORTED_1000baseT_Full;
if (hw_link_info->requested_speeds & I40E_LINK_SPEED_1GB)
*advertising |= ADVERTISED_1000baseT_Full;
if (pf->hw_features & I40E_HW_100M_SGMII_CAPABLE) {
@@ -289,9 +287,7 @@ static void i40e_phy_type_to_ethtool(struct i40e_pf *pf, 
u32 *supported,
phy_types & I40E_CAP_PHY_TYPE_10GBASE_T ||
phy_types & I40E_CAP_PHY_TYPE_10GBASE_SR ||
phy_types & I40E_CAP_PHY_TYPE_10GBASE_LR) {
-   *supported |= SUPPORTED_Autoneg |
- SUPPORTED_1baseT_Full;
-   *advertising |= ADVERTISED_Autoneg;
+   *supported |= SUPPORTED_1baseT_Full;
if (hw_link_info->requested_speeds & I40E_LINK_SPEED_10GB)
*advertising |= ADVERTISED_1baseT_Full;
}
@@ -301,16 +297,12 @@ static void i40e_phy_type_to_ethtool(struct i40e_pf *pf, 
u32 *supported,
*supported |= SUPPORTED_4baseCR4_Full;
if (phy_types & I40E_CAP_PHY_TYPE_40GBASE_CR4_CU ||
phy_types & I40E_CAP_PHY_TYPE_40GBASE_CR4) {
-   *supported |= SUPPORTED_Autoneg |
- SUPPORTED_4baseCR4_Full;
-   *advertising |= ADVERTISED_Autoneg;
+   *supported |= SUPPORTED_4baseCR4_Full;
if (hw_link_info->requested_speeds & I40E_LINK_SPEED_40GB)
*advertising |= ADVERTISED_4baseCR4_Full;
}
if (phy_types & I40E_CAP_PHY_TYPE_100BASE_TX) {
-   *supported |= SUPPORTED_Autoneg |
- SUPPORTED_100baseT_Full;
-   *advertising |= ADVERTISED_Autoneg;
+   *supported |= SUPPORTED_100baseT_Full;
if (hw_link_info->requested_speeds & I40E_LINK_SPEED_100MB)
*advertising |= ADVERTISED_100baseT_Full;
}
@@ -318,9 +310,7 @@ static void i40e_phy_type_to_ethtool(struct i40e_pf *pf, 
u32 *supported,
phy_types & I40E_CAP_PHY_TYPE_1000BASE_SX ||
phy_types & I40E_CAP_PHY_TYPE_1000BASE_LX ||
phy_types & I40E_CAP_PHY_TYPE_1000BASE_T_OPTICAL) {
-   *supported |= SUPPORTED_Autoneg |
- SUPPORTED_1000baseT_Full;
-   *advertising |= ADVERTISED_Autoneg;
+   *supported |= SUPPORTED_1000baseT_Full;
if (hw_link_info->requested_speeds & I40E_LINK_SPEED_1GB)
*advertising |= ADVERTISED_1000baseT_Full;
}
@@ -329,47 +319,54 @@ static void i40e_phy_type_to_ethtool(struct i40e_pf *pf, 
u32 *supported,
if (phy_types & I40E_CAP_PHY_TYPE_40GBASE_LR4)
*supported |= SUPPORTED_4baseLR4_Full;
if (phy_types & I40E_CAP_PHY_TYPE_40GBASE_KR4) {
-   *supported |= SUPPORTED_4baseKR4_Full |
- SUPPORTED_Autoneg;
-   *advertising |= ADVERTISED_4baseKR4_Full |
-   ADVERTISED_Autoneg;
+   *supported |= SUPPORTED_4baseKR4_Full;
+   *advertising |= ADVERTISED_4baseKR4_Full;
}
if (phy_types & I40E_CAP_PHY_TYPE_20GBASE_KR2) {
-   *supported |= SUPPORTED_2baseKR2_Full |
- SUPPORTED_Autoneg;
-   *advertising |= ADVERTISED_Autoneg;
+   *supported |= SUPPORTED_2baseKR2_Full;
if (hw_link_info->requested_speeds & I40E_LINK_SPEED_20GB)
*advertising |= ADVERTISED_2baseKR2_Full;
}
-   if (phy_types & I40E_CAP_PHY_TYPE_10GBASE_KR) {
-   if

[net-next 03/15] i40e: add function header for i40e_get_rxfh

2017-10-17 Thread Jeff Kirsher

From: Alan Brady 

Someone left this poor little function naked with no header.  This
dresses it up in a proper function header it deserves.

Signed-off-by: Alan Brady 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_ethtool.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c 
b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
index c250116e5e22..f4a70ef3f2e0 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
@@ -3968,6 +3968,16 @@ static u32 i40e_get_rxfh_indir_size(struct net_device 
*netdev)
return I40E_HLUT_ARRAY_SIZE;
 }
 
+/**
+ * i40e_get_rxfh - get the rx flow hash indirection table
+ * @netdev: network interface device structure
+ * @indir: indirection table
+ * @key: hash key
+ * @hfunc: hash function
+ *
+ * Reads the indirection table directly from the hardware. Returns 0 on
+ * success.
+ **/
 static int i40e_get_rxfh(struct net_device *netdev, u32 *indir, u8 *key,
 u8 *hfunc)
 {
-- 
2.14.2

[net-next 12/15] i40e: convert i40e_get_settings_link_up to new API

2017-10-17 Thread Jeff Kirsher

From: Alan Brady 

This removes references to old ethtool API macros and functions in
i40e_get_settings_link_up as part of the process of converting to the
new API.  The new API also allows us to provide more explicit support
for new 25G and 10G PHY types so some of the PHY types have been
adjusted where necessary as well.

Signed-off-by: Alan Brady 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_ethtool.c | 187 +
 1 file changed, 125 insertions(+), 62 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c 
b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
index 0cef8aa85c1d..913ba91fac6c 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
@@ -474,125 +474,192 @@ static void i40e_get_settings_link_up(struct i40e_hw 
*hw,
struct i40e_link_status *hw_link_info = >phy.link_info;
struct ethtool_link_ksettings cap_ksettings;
u32 link_speed = hw_link_info->link_speed;
-   u32 supported, advertising;
-
-   ethtool_convert_link_mode_to_legacy_u32(,
-   ks->link_modes.supported);
-   ethtool_convert_link_mode_to_legacy_u32(,
-   ks->link_modes.advertising);
 
/* Initialize supported and advertised settings based on phy settings */
switch (hw_link_info->phy_type) {
case I40E_PHY_TYPE_40GBASE_CR4:
case I40E_PHY_TYPE_40GBASE_CR4_CU:
-   supported = SUPPORTED_Autoneg |
-   SUPPORTED_4baseCR4_Full;
-   advertising = ADVERTISED_Autoneg |
- ADVERTISED_4baseCR4_Full;
+   ethtool_link_ksettings_add_link_mode(ks, supported, Autoneg);
+   ethtool_link_ksettings_add_link_mode(ks, supported,
+4baseCR4_Full);
+   ethtool_link_ksettings_add_link_mode(ks, advertising, Autoneg);
+   ethtool_link_ksettings_add_link_mode(ks, advertising,
+4baseCR4_Full);
break;
case I40E_PHY_TYPE_XLAUI:
case I40E_PHY_TYPE_XLPPI:
case I40E_PHY_TYPE_40GBASE_AOC:
-   supported = SUPPORTED_4baseCR4_Full;
+   ethtool_link_ksettings_add_link_mode(ks, supported,
+4baseCR4_Full);
break;
case I40E_PHY_TYPE_40GBASE_SR4:
-   supported = SUPPORTED_4baseSR4_Full;
+   ethtool_link_ksettings_add_link_mode(ks, supported,
+4baseSR4_Full);
break;
case I40E_PHY_TYPE_40GBASE_LR4:
-   supported = SUPPORTED_4baseLR4_Full;
+   ethtool_link_ksettings_add_link_mode(ks, supported,
+4baseLR4_Full);
break;
+   case I40E_PHY_TYPE_25GBASE_SR:
+   case I40E_PHY_TYPE_25GBASE_LR:
case I40E_PHY_TYPE_10GBASE_SR:
case I40E_PHY_TYPE_10GBASE_LR:
case I40E_PHY_TYPE_1000BASE_SX:
case I40E_PHY_TYPE_1000BASE_LX:
-   supported = SUPPORTED_1baseT_Full;
+   ethtool_link_ksettings_add_link_mode(ks, supported, Autoneg);
+   ethtool_link_ksettings_add_link_mode(ks, advertising, Autoneg);
+   ethtool_link_ksettings_add_link_mode(ks, supported,
+25000baseSR_Full);
+   ethtool_link_ksettings_add_link_mode(ks, advertising,
+25000baseSR_Full);
+   ethtool_link_ksettings_add_link_mode(ks, supported,
+1baseSR_Full);
+   ethtool_link_ksettings_add_link_mode(ks, advertising,
+1baseSR_Full);
+   ethtool_link_ksettings_add_link_mode(ks, supported,
+1baseLR_Full);
+   ethtool_link_ksettings_add_link_mode(ks, advertising,
+1baseLR_Full);
+   ethtool_link_ksettings_add_link_mode(ks, supported,
+1000baseX_Full);
+   ethtool_link_ksettings_add_link_mode(ks, advertising,
+1000baseX_Full);
+   ethtool_link_ksettings_add_link_mode(ks, supported,
+1baseT_Full);
if (hw_link_info->module_type[2] &
I40E_MODULE_TYPE_1000BASE_SX ||

[net-next 09/15] i40e: Add new PHY types for 25G AOC and ACC support

2017-10-17 Thread Jeff Kirsher

From: Sudheer Mogilappagari 

This patch adds support for 25G Active Optical Cables (AOC) and Active
Copper Cables (ACC) PHY types.

Signed-off-by: Sudheer Mogilappagari 
Signed-off-by: Krzysztof Malek 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h   | 4 
 drivers/net/ethernet/intel/i40e/i40e_common.c   | 2 ++
 drivers/net/ethernet/intel/i40e/i40e_ethtool.c  | 2 ++
 drivers/net/ethernet/intel/i40e/i40e_type.h | 4 
 drivers/net/ethernet/intel/i40evf/i40e_adminq_cmd.h | 4 
 5 files changed, 16 insertions(+)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h 
b/drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h
index a8f65aed5421..6a5db1b33fa2 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h
@@ -1771,6 +1771,8 @@ enum i40e_aq_phy_type {
I40E_PHY_TYPE_25GBASE_CR= 0x20,
I40E_PHY_TYPE_25GBASE_SR= 0x21,
I40E_PHY_TYPE_25GBASE_LR= 0x22,
+   I40E_PHY_TYPE_25GBASE_AOC   = 0x23,
+   I40E_PHY_TYPE_25GBASE_ACC   = 0x24,
I40E_PHY_TYPE_MAX,
I40E_PHY_TYPE_NOT_SUPPORTED_HIGH_TEMP   = 0xFD,
I40E_PHY_TYPE_EMPTY = 0xFE,
@@ -1831,6 +1833,8 @@ struct i40e_aq_get_phy_abilities_resp {
 #define I40E_AQ_PHY_TYPE_EXT_25G_CR0X02
 #define I40E_AQ_PHY_TYPE_EXT_25G_SR0x04
 #define I40E_AQ_PHY_TYPE_EXT_25G_LR0x08
+#define I40E_AQ_PHY_TYPE_EXT_25G_AOC   0x10
+#define I40E_AQ_PHY_TYPE_EXT_25G_ACC   0x20
u8  fec_cfg_curr_mod_ext_info;
 #define I40E_AQ_ENABLE_FEC_KR  0x01
 #define I40E_AQ_ENABLE_FEC_RS  0x02
diff --git a/drivers/net/ethernet/intel/i40e/i40e_common.c 
b/drivers/net/ethernet/intel/i40e/i40e_common.c
index aeb497258f20..8d0ee006606b 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_common.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_common.c
@@ -1180,6 +1180,8 @@ static enum i40e_media_type i40e_get_media_type(struct 
i40e_hw *hw)
case I40E_PHY_TYPE_40GBASE_AOC:
case I40E_PHY_TYPE_10GBASE_AOC:
case I40E_PHY_TYPE_25GBASE_CR:
+   case I40E_PHY_TYPE_25GBASE_AOC:
+   case I40E_PHY_TYPE_25GBASE_ACC:
media = I40E_MEDIA_TYPE_DA;
break;
case I40E_PHY_TYPE_1000BASE_KX:
diff --git a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c 
b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
index 30deae77e745..a4210ccdaa5f 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
@@ -502,6 +502,8 @@ static void i40e_get_settings_link_up(struct i40e_hw *hw,
case I40E_PHY_TYPE_25GBASE_CR:
case I40E_PHY_TYPE_25GBASE_SR:
case I40E_PHY_TYPE_25GBASE_LR:
+   case I40E_PHY_TYPE_25GBASE_AOC:
+   case I40E_PHY_TYPE_25GBASE_ACC:
supported = SUPPORTED_Autoneg;
advertising = ADVERTISED_Autoneg;
/* TODO: add speeds when ethtool is ready to support*/
diff --git a/drivers/net/ethernet/intel/i40e/i40e_type.h 
b/drivers/net/ethernet/intel/i40e/i40e_type.h
index 0410fcbdbb94..17a99b53acd9 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_type.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_type.h
@@ -271,6 +271,10 @@ struct i40e_phy_info {
 I40E_PHY_TYPE_OFFSET)
 #define I40E_CAP_PHY_TYPE_25GBASE_LR BIT_ULL(I40E_PHY_TYPE_25GBASE_LR + \
 I40E_PHY_TYPE_OFFSET)
+#define I40E_CAP_PHY_TYPE_25GBASE_AOC BIT_ULL(I40E_PHY_TYPE_25GBASE_AOC + \
+I40E_PHY_TYPE_OFFSET)
+#define I40E_CAP_PHY_TYPE_25GBASE_ACC BIT_ULL(I40E_PHY_TYPE_25GBASE_ACC + \
+I40E_PHY_TYPE_OFFSET)
 #define I40E_HW_CAP_MAX_GPIO   30
 /* Capabilities of a PF or a VF or the whole device */
 struct i40e_hw_capabilities {
diff --git a/drivers/net/ethernet/intel/i40evf/i40e_adminq_cmd.h 
b/drivers/net/ethernet/intel/i40evf/i40e_adminq_cmd.h
index 60c892f559b9..463e331a70a9 100644
--- a/drivers/net/ethernet/intel/i40evf/i40e_adminq_cmd.h
+++ b/drivers/net/ethernet/intel/i40evf/i40e_adminq_cmd.h
@@ -1767,6 +1767,8 @@ enum i40e_aq_phy_type {
I40E_PHY_TYPE_25GBASE_CR= 0x20,
I40E_PHY_TYPE_25GBASE_SR= 0x21,
I40E_PHY_TYPE_25GBASE_LR= 0x22,
+   I40E_PHY_TYPE_25GBASE_AOC   = 0x23,
+   I40E_PHY_TYPE_25GBASE_ACC   = 0x24,
I40E_PHY_TYPE_MAX,
I40E_PHY_TYPE_NOT_SUPPORTED_HIGH_TEMP   = 0xFD,
I40E_PHY_TYPE_EMPTY = 0xFE,
@@ -1827,6 +1829,8 @@ struct i40e_aq_get_phy_abilities_resp {
 #define

[net-next 07/15] i40e: fix whitespace issues in i40e_ethtool.c

2017-10-17 Thread Jeff Kirsher

From: Alan Brady 

There's a number of minor incidental whitespace issues in this file.
This addresses most of the ones I could find.

Signed-off-by: Alan Brady 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_ethtool.c | 44 +++---
 1 file changed, 18 insertions(+), 26 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c 
b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
index e40fb559dacb..89ab398a7d30 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
@@ -511,7 +511,8 @@ static void i40e_get_settings_link_up(struct i40e_hw *hw,
break;
default:
/* if we got here and link is up something bad is afoot */
-   netdev_info(netdev, "WARNING: Link is up but PHY type 0x%x is 
not recognized.\n",
+   netdev_info(netdev,
+   "WARNING: Link is up but PHY type 0x%x is not 
recognized.\n",
hw_link_info->phy_type);
}
 
@@ -614,14 +615,12 @@ static int i40e_get_link_ksettings(struct net_device 
*netdev,
ks->base.autoneg = ((hw_link_info->an_info & I40E_AQ_AN_COMPLETED) ?
AUTONEG_ENABLE : AUTONEG_DISABLE);
 
+   /* Set media type settings */
switch (hw->phy.media_type) {
case I40E_MEDIA_TYPE_BACKPLANE:
-   ethtool_link_ksettings_add_link_mode(ks, supported,
-Autoneg);
-   ethtool_link_ksettings_add_link_mode(ks, supported,
-Backplane);
-   ethtool_link_ksettings_add_link_mode(ks, advertising,
-Autoneg);
+   ethtool_link_ksettings_add_link_mode(ks, supported, Autoneg);
+   ethtool_link_ksettings_add_link_mode(ks, supported, Backplane);
+   ethtool_link_ksettings_add_link_mode(ks, advertising, Autoneg);
ethtool_link_ksettings_add_link_mode(ks, advertising,
 Backplane);
ks->base.port = PORT_NONE;
@@ -652,16 +651,14 @@ static int i40e_get_link_ksettings(struct net_device 
*netdev,
 
switch (hw->fc.requested_mode) {
case I40E_FC_FULL:
-   ethtool_link_ksettings_add_link_mode(ks, advertising,
-Pause);
+   ethtool_link_ksettings_add_link_mode(ks, advertising, Pause);
break;
case I40E_FC_TX_PAUSE:
ethtool_link_ksettings_add_link_mode(ks, advertising,
 Asym_Pause);
break;
case I40E_FC_RX_PAUSE:
-   ethtool_link_ksettings_add_link_mode(ks, advertising,
-Pause);
+   ethtool_link_ksettings_add_link_mode(ks, advertising, Pause);
ethtool_link_ksettings_add_link_mode(ks, advertising,
 Asym_Pause);
break;
@@ -708,17 +705,14 @@ static int i40e_set_link_ksettings(struct net_device 
*netdev,
i40e_partition_setting_complaint(pf);
return -EOPNOTSUPP;
}
-
if (vsi != pf->vsi[pf->lan_vsi])
return -EOPNOTSUPP;
-
if (hw->phy.media_type != I40E_MEDIA_TYPE_BASET &&
hw->phy.media_type != I40E_MEDIA_TYPE_FIBER &&
hw->phy.media_type != I40E_MEDIA_TYPE_BACKPLANE &&
hw->phy.media_type != I40E_MEDIA_TYPE_DA &&
hw->phy.link_info.link_info & I40E_AQ_LINK_UP)
return -EOPNOTSUPP;
-
if (hw->device_id == I40E_DEV_ID_KX_B ||
hw->device_id == I40E_DEV_ID_KX_C ||
hw->device_id == I40E_DEV_ID_20G_KR2 ||
@@ -844,7 +838,6 @@ static int i40e_set_link_ksettings(struct net_device 
*netdev,
 */
if (!config.link_speed)
config.link_speed = abilities.link_speed;
-
if (change || (abilities.link_speed != config.link_speed)) {
/* copy over the rest of the abilities */
config.phy_type = abilities.phy_type;
@@ -872,7 +865,8 @@ static int i40e_set_link_ksettings(struct net_device 
*netdev,
/* make the aq call */
status = i40e_aq_set_phy_config(hw, , NULL);
if (status) {
-   netdev_info(netdev, "Set phy config failed, err %s 
aq_err %s\n",
+   netdev_info(netdev,
+   "Set phy config failed, err %s aq_err %s\n",
i40e_stat_str(hw, status),
i40e_aq_str(hw,

[net-next 10/15] ethtool: add ethtool_intersect_link_masks

2017-10-17 Thread Jeff Kirsher

From: Alan Brady 

This function provides a way to intersect two link masks together to
find the common ground between them.  For example in i40e, the driver
first generates link masks for what is supported by the PHY type.  The
driver then gets the link masks for what the NVM supports.  The
resulting intersection between them yields what can truly be supported.

Signed-off-by: Alan Brady 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 include/linux/ethtool.h | 10 ++
 net/core/ethtool.c  | 16 
 2 files changed, 26 insertions(+)

diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h
index 4587a4c36923..c77fa3529e15 100644
--- a/include/linux/ethtool.h
+++ b/include/linux/ethtool.h
@@ -163,6 +163,16 @@ extern int
 __ethtool_get_link_ksettings(struct net_device *dev,
 struct ethtool_link_ksettings *link_ksettings);
 
+/**
+ * ethtool_intersect_link_masks - Given two link masks, AND them together
+ * @dst: first mask and where result is stored
+ * @src: second mask to intersect with
+ *
+ * Given two link mode masks, AND them together and save the result in dst.
+ */
+void ethtool_intersect_link_masks(struct ethtool_link_ksettings *dst,
+ struct ethtool_link_ksettings *src);
+
 void ethtool_convert_legacy_u32_to_link_mode(unsigned long *dst,
 u32 legacy_u32);
 
diff --git a/net/core/ethtool.c b/net/core/ethtool.c
index 3228411ada0f..0c406306792a 100644
--- a/net/core/ethtool.c
+++ b/net/core/ethtool.c
@@ -403,6 +403,22 @@ static int __ethtool_set_flags(struct net_device *dev, u32 
data)
return 0;
 }
 
+/* Given two link masks, AND them together and save the result in dst. */
+void ethtool_intersect_link_masks(struct ethtool_link_ksettings *dst,
+ struct ethtool_link_ksettings *src)
+{
+   unsigned int size = BITS_TO_LONGS(__ETHTOOL_LINK_MODE_MASK_NBITS);
+   unsigned int idx = 0;
+
+   for (; idx < size; idx++) {
+   dst->link_modes.supported[idx] &=
+   src->link_modes.supported[idx];
+   dst->link_modes.advertising[idx] &=
+   src->link_modes.advertising[idx];
+   }
+}
+EXPORT_SYMBOL(ethtool_intersect_link_masks);
+
 void ethtool_convert_legacy_u32_to_link_mode(unsigned long *dst,
 u32 legacy_u32)
 {
-- 
2.14.2

Re: [PATCH net-next 1/3] ipv6: fix route cache dump

2017-10-17 Thread Wei Wang

On Tue, Oct 17, 2017 at 10:40 AM, Paolo Abeni  wrote:
> After the commit 2b760fcf5cfb ("ipv6: hook up exception table to
> store dst cache"), entries in the routing cache are not shown by:
>
> ip route show cache
>
> because the per route exception table containing such routes is not
> traversed by rt6_dump_route().
> Fix it by explicitly dumping all routes present into the
> rt6i_exception_bucket.
>
> Fixes: 2b760fcf5cfb ("ipv6: hook up exception table to store dst cache")
> Signed-off-by: Paolo Abeni 
> ---
>  net/ipv6/route.c | 30 ++
>  1 file changed, 26 insertions(+), 4 deletions(-)
>
> diff --git a/net/ipv6/route.c b/net/ipv6/route.c
> index 01a103c23a6c..5bb53dbd4fd3 100644
> --- a/net/ipv6/route.c
> +++ b/net/ipv6/route.c
> @@ -4190,10 +4190,14 @@ static int rt6_fill_node(struct net *net,
> return -EMSGSIZE;
>  }
>
> +/* this is called under the RCU lock */
>  int rt6_dump_route(struct rt6_info *rt, void *p_arg)
>  {
> struct rt6_rtnl_dump_arg *arg = (struct rt6_rtnl_dump_arg *) p_arg;
> +   struct rt6_exception_bucket *bucket;
> +   struct rt6_exception *rt6_ex;
> struct net *net = arg->net;
> +   int err, port_id, seq, i;
>
> if (rt == net->ipv6.ip6_null_entry)
> return 0;
> @@ -4209,10 +4213,28 @@ int rt6_dump_route(struct rt6_info *rt, void *p_arg)
> }
> }
>
> -   return rt6_fill_node(net,
> -arg->skb, rt, NULL, NULL, 0, RTM_NEWROUTE,
> -NETLINK_CB(arg->cb->skb).portid, arg->cb->nlh->nlmsg_seq,
> -NLM_F_MULTI);
> +   /* dump execeptions table, if available */
> +   port_id = NETLINK_CB(arg->cb->skb).portid;
> +   seq = arg->cb->nlh->nlmsg_seq;
> +   bucket = rcu_dereference(rt->rt6i_exception_bucket);
> +   if (!bucket)
> +   goto no_exceptions;
> +
> +   for (i = 0; i < FIB6_EXCEPTION_BUCKET_SIZE; i++) {
> +   hlist_for_each_entry_rcu(rt6_ex, >chain, hlist) {
> +   err = rt6_fill_node(net, arg->skb, rt6_ex->rt6i, NULL,
> +   NULL, 0, RTM_NEWROUTE, port_id, 
> seq,
> +   NLM_F_MULTI);
> +   if (err)
> +   return err;
> +   }
> +
> +   bucket++;
> +   }
> +
> +no_exceptions:
> +   return rt6_fill_node(net, arg->skb, rt, NULL, NULL, 0, RTM_NEWROUTE,
> +port_id, seq, NLM_F_MULTI);
>  }
>

Hi Paolo,

Thanks for doing this.
But I think your patch does not take care of the case where there are
a lot of cached routes in the exception table and 1 skb is just not
enough to dump the main route + all cached routes in the exception
table.
In this case, your patch will keep dumping the same main route.

I think some logic needs to be incorporated into the fib6_walk() so
that it can also remember the last dumped cached route if necessary in
the exception table and start from there for the next dump.
I do have a patch for that and that patch tries to keep a linked list
of all cached routes from the exception table in the walker struct and
remove any routes that are already dumped.
It is a bit complicated and might not be the best solution. And as
IPv4 already does not support dumping cached routes, I did not send
that out in the previous patch series.


>  static int inet6_rtm_getroute(struct sk_buff *in_skb, struct nlmsghdr *nlh,
> --
> 2.13.6
>

[net-next 15/15] i40e: fix u64 division usage

2017-10-17 Thread Jeff Kirsher

From: Alan Brady 

Commit 52eb1ff93e98 ("i40e: Add support setting TC max bandwidth rates")
and commit 1ea6f21ae530 ("i40e: Refactor VF BW rate limiting") add some
needed functionality for TC bandwidth rate limiting.  Unfortunately they
introduce several usages of unsigned 64-bit division which needs to be
handled special by the kernel to support all architectures.

Fixes: 52eb1ff93e98 ("i40e: Add support setting TC max bandwidth
rates")
Fixes: 1ea6f21ae530 ("i40e: Refactor VF BW rate limiting")

Signed-off-by: Alan Brady 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e.h  |  3 +-
 drivers/net/ethernet/intel/i40e/i40e_main.c | 58 -
 2 files changed, 42 insertions(+), 19 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e.h 
b/drivers/net/ethernet/intel/i40e/i40e.h
index 266e1dc5e786..eb017763646d 100644
--- a/drivers/net/ethernet/intel/i40e/i40e.h
+++ b/drivers/net/ethernet/intel/i40e/i40e.h
@@ -130,7 +130,8 @@
 
 /* BW rate limiting */
 #define I40E_BW_CREDIT_DIVISOR 50 /* 50Mbps per BW credit */
-#define I40E_MAX_BW_INACTIVE_ACCUM 4  /* accumulate 4 credits max */
+#define I40E_BW_MBPS_DIVISOR   125000 /* rate / (100 / 8) Mbps */
+#define I40E_MAX_BW_INACTIVE_ACCUM 4 /* accumulate 4 credits max */
 
 /* driver state flags */
 enum i40e_state_t {
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c 
b/drivers/net/ethernet/intel/i40e/i40e_main.c
index bb31d53c4923..1252aaf92fd3 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -5442,6 +5442,7 @@ int i40e_get_link_speed(struct i40e_vsi *vsi)
 int i40e_set_bw_limit(struct i40e_vsi *vsi, u16 seid, u64 max_tx_rate)
 {
struct i40e_pf *pf = vsi->back;
+   u64 credits = 0;
int speed = 0;
int ret = 0;
 
@@ -5459,8 +5460,9 @@ int i40e_set_bw_limit(struct i40e_vsi *vsi, u16 seid, u64 
max_tx_rate)
}
 
/* Tx rate credits are in values of 50Mbps, 0 is disabled */
-   ret = i40e_aq_config_vsi_bw_limit(>hw, seid,
- max_tx_rate / I40E_BW_CREDIT_DIVISOR,
+   credits = max_tx_rate;
+   do_div(credits, I40E_BW_CREDIT_DIVISOR);
+   ret = i40e_aq_config_vsi_bw_limit(>hw, seid, credits,
  I40E_MAX_BW_INACTIVE_ACCUM, NULL);
if (ret)
dev_err(>pdev->dev,
@@ -6063,13 +6065,17 @@ int i40e_create_queue_channel(struct i40e_vsi *vsi,
 
/* configure VSI for BW limit */
if (ch->max_tx_rate) {
+   u64 credits = ch->max_tx_rate;
+
if (i40e_set_bw_limit(vsi, ch->seid, ch->max_tx_rate))
return -EINVAL;
 
+   do_div(credits, I40E_BW_CREDIT_DIVISOR);
dev_dbg(>pdev->dev,
"Set tx rate of %llu Mbps (count of 50Mbps %llu) for 
vsi->seid %u\n",
ch->max_tx_rate,
-   ch->max_tx_rate / I40E_BW_CREDIT_DIVISOR, ch->seid);
+   credits,
+   ch->seid);
}
 
/* in case of VF, this will be main SRIOV VSI */
@@ -6090,6 +6096,7 @@ int i40e_create_queue_channel(struct i40e_vsi *vsi,
 static int i40e_configure_queue_channels(struct i40e_vsi *vsi)
 {
struct i40e_channel *ch;
+   u64 max_rate = 0;
int ret = 0, i;
 
/* Create app vsi with the TCs. Main VSI with TC0 is already set up */
@@ -6110,8 +6117,9 @@ static int i40e_configure_queue_channels(struct i40e_vsi 
*vsi)
/* Bandwidth limit through tc interface is in bytes/s,
 * change to Mbit/s
 */
-   ch->max_tx_rate =
-   vsi->mqprio_qopt.max_rate[i] / (100 / 8);
+   max_rate = vsi->mqprio_qopt.max_rate[i];
+   do_div(max_rate, I40E_BW_MBPS_DIVISOR);
+   ch->max_tx_rate = max_rate;
 
list_add_tail(>list, >ch_list);
 
@@ -6540,6 +6548,7 @@ static int i40e_validate_mqprio_qopt(struct i40e_vsi *vsi,
 struct tc_mqprio_qopt_offload *mqprio_qopt)
 {
u64 sum_max_rate = 0;
+   u64 max_rate = 0;
int i;
 
if (mqprio_qopt->qopt.offset[0] != 0 ||
@@ -6554,7 +6563,9 @@ static int i40e_validate_mqprio_qopt(struct i40e_vsi *vsi,
"Invalid min tx rate (greater than 0) 
specified\n");
return -EINVAL;
}
-   sum_max_rate += (mqprio_qopt->max_rate[i] / (100 / 8));
+   max_rate = mqprio_qopt->max_rate[i];
+   do_div(max_rate, I40E_BW_MBPS_DIVISOR);
+   sum_max_rate += max_rate;
 
if (i >=

[net-next 13/15] i40e: rename 'change' variable to 'autoneg_changed'

2017-10-17 Thread Jeff Kirsher

From: Alan Brady 

This variable isn't actually very descriptive and makes the code a bit
confusing as to what it is being used for.  This patch enhances the
variable with the longer name, 'autoneg_changed', which makes it clear
we are concerned with autoneg changing in this context.

Signed-off-by: Alan Brady 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_ethtool.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c 
b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
index 913ba91fac6c..9c70555bf49c 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
@@ -822,14 +822,14 @@ static int i40e_set_link_ksettings(struct net_device 
*netdev,
 {
struct i40e_netdev_priv *np = netdev_priv(netdev);
struct i40e_aq_get_phy_abilities_resp abilities;
+   struct ethtool_link_ksettings safe_ks;
+   struct ethtool_link_ksettings copy_ks;
struct i40e_aq_set_phy_config config;
struct i40e_pf *pf = np->vsi->back;
struct i40e_vsi *vsi = np->vsi;
struct i40e_hw *hw = >hw;
-   struct ethtool_link_ksettings safe_ks;
-   struct ethtool_link_ksettings copy_ks;
+   bool autoneg_changed = false;
i40e_status status = 0;
-   bool change = false;
int timeout = 50;
int err = 0;
u32 autoneg;
@@ -922,7 +922,7 @@ static int i40e_set_link_ksettings(struct net_device 
*netdev,
/* Autoneg is allowed to change */
config.abilities = abilities.abilities |
   I40E_AQ_PHY_ENABLE_AN;
-   change = true;
+   autoneg_changed = true;
}
} else {
/* If autoneg is currently enabled */
@@ -942,7 +942,7 @@ static int i40e_set_link_ksettings(struct net_device 
*netdev,
/* Autoneg is allowed to change */
config.abilities = abilities.abilities &
   ~I40E_AQ_PHY_ENABLE_AN;
-   change = true;
+   autoneg_changed = true;
}
}
 
@@ -976,7 +976,7 @@ static int i40e_set_link_ksettings(struct net_device 
*netdev,
 */
if (!config.link_speed)
config.link_speed = abilities.link_speed;
-   if (change || (abilities.link_speed != config.link_speed)) {
+   if (autoneg_changed || abilities.link_speed != config.link_speed) {
/* copy over the rest of the abilities */
config.phy_type = abilities.phy_type;
config.phy_type_ext = abilities.phy_type_ext;
-- 
2.14.2

[net-next 14/15] i40e: convert i40e_set_link_ksettings to new API

2017-10-17 Thread Jeff Kirsher

From: Alan Brady 

This finishes off the conversion to the new ethtool API by removing the
old macros being used in i40e_set_link_ksettings and replacing them with
shiny new ones.

This conversion also allows us to provide link speed support for new 25G
and 10G macros which is included here as well.

Signed-off-by: Alan Brady 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_ethtool.c | 93 --
 1 file changed, 57 insertions(+), 36 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c 
b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
index 9c70555bf49c..9eb618799a30 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
@@ -832,9 +832,7 @@ static int i40e_set_link_ksettings(struct net_device 
*netdev,
i40e_status status = 0;
int timeout = 50;
int err = 0;
-   u32 autoneg;
-   u32 advertise;
-   u32 tmp;
+   u8 autoneg;
 
/* Changing port settings is not supported if this isn't the
 * port's controlling PF
@@ -862,28 +860,34 @@ static int i40e_set_link_ksettings(struct net_device 
*netdev,
/* copy the ksettings to copy_ks to avoid modifying the origin */
memcpy(_ks, ks, sizeof(struct ethtool_link_ksettings));
 
+   /* save autoneg out of ksettings */
+   autoneg = copy_ks.base.autoneg;
+
+   memset(_ks, 0, sizeof(safe_ks));
+   /* Get link modes supported by hardware and check against modes
+* requested by the user.  Return an error if unsupported mode was set.
+*/
+   i40e_phy_type_to_ethtool(pf, _ks);
+   if (!bitmap_subset(copy_ks.link_modes.advertising,
+  safe_ks.link_modes.supported,
+  __ETHTOOL_LINK_MODE_MASK_NBITS))
+   return -EINVAL;
+
/* get our own copy of the bits to check against */
memset(_ks, 0, sizeof(struct ethtool_link_ksettings));
+   safe_ks.base.cmd = copy_ks.base.cmd;
+   safe_ks.base.link_mode_masks_nwords =
+   copy_ks.base.link_mode_masks_nwords;
i40e_get_link_ksettings(netdev, _ks);
 
-   /* save autoneg and speed out of ksettings */
-   autoneg = ks->base.autoneg;
-   ethtool_convert_link_mode_to_legacy_u32(,
-   ks->link_modes.advertising);
-
-   /* set autoneg and speed back to what they currently are */
+   /* set autoneg back to what it currently is */
copy_ks.base.autoneg = safe_ks.base.autoneg;
-   ethtool_convert_link_mode_to_legacy_u32(
-   , safe_ks.link_modes.advertising);
-   ethtool_convert_legacy_u32_to_link_mode(
-   copy_ks.link_modes.advertising, tmp);
 
-   copy_ks.base.cmd = safe_ks.base.cmd;
-
-   /* If copy_ks and safe_ks are not the same now, then they are
-* trying to set something that we do not support
+   /* If copy_ks.base and safe_ks.base are not the same now, then they are
+* trying to set something that we do not support.
 */
-   if (memcmp(_ks, _ks, sizeof(struct ethtool_link_ksettings)))
+   if (memcmp(_ks.base, _ks.base,
+  sizeof(struct ethtool_link_settings)))
return -EOPNOTSUPP;
 
while (test_and_set_bit(__I40E_CONFIG_BUSY, pf->state)) {
@@ -946,28 +950,45 @@ static int i40e_set_link_ksettings(struct net_device 
*netdev,
}
}
 
-   ethtool_convert_link_mode_to_legacy_u32(,
-   safe_ks.link_modes.supported);
-   if (advertise & ~tmp) {
-   err = -EINVAL;
-   goto done;
-   }
-
-   if (advertise & ADVERTISED_100baseT_Full)
+   if (ethtool_link_ksettings_test_link_mode(ks, advertising,
+ 100baseT_Full))
config.link_speed |= I40E_LINK_SPEED_100MB;
-   if (advertise & ADVERTISED_1000baseT_Full ||
-   advertise & ADVERTISED_1000baseKX_Full)
+   if (ethtool_link_ksettings_test_link_mode(ks, advertising,
+ 1000baseT_Full) ||
+   ethtool_link_ksettings_test_link_mode(ks, advertising,
+ 1000baseX_Full) ||
+   ethtool_link_ksettings_test_link_mode(ks, advertising,
+ 1000baseKX_Full))
config.link_speed |= I40E_LINK_SPEED_1GB;
-   if (advertise & ADVERTISED_1baseT_Full ||
-   advertise & ADVERTISED_1baseKX4_Full ||
-   advertise & ADVERTISED_1baseKR_Full)
+   if (ethtool_link_ksettings_test_link_mode(ks, advertising,
+ 1baseT_Full) ||
+

[net-next 02/15] i40e: remove ifdef SPEED_25000

2017-10-17 Thread Jeff Kirsher

From: Alan Brady 

This 'ifdef' doesn't accomplish anything so remove it.

Signed-off-by: Alan Brady 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_ethtool.c | 5 -
 1 file changed, 5 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c 
b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
index 06514a76ff91..c250116e5e22 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
@@ -531,12 +531,7 @@ static void i40e_get_settings_link_up(struct i40e_hw *hw,
ks->base.speed = SPEED_4;
break;
case I40E_LINK_SPEED_25GB:
-#ifdef SPEED_25000
ks->base.speed = SPEED_25000;
-#else
-   netdev_info(netdev,
-   "Speed is 25G, display not supported by this 
version of ethtool.\n");
-#endif
break;
case I40E_LINK_SPEED_20GB:
ks->base.speed = SPEED_2;
-- 
2.14.2

[net-next 11/15] i40e: convert i40e_phy_type_to_ethtool to new API

2017-10-17 Thread Jeff Kirsher

From: Alan Brady 

We are still largely using the old ethtool API macros.  This is
problematic because eventually they will be removed and they only
support 32 bits of PHY types.

This overhauls i40e_phy_type_to_ethtool to use only the new API.  Doing
this also allows us to provide much better support for newer 25G and 10G
PHY types which is included here as well.

The remaining usages of the old ethtool API will be addressed in other
patches in the series.

Signed-off-by: Alan Brady 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_ethtool.c | 204 +
 1 file changed, 140 insertions(+), 64 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c 
b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
index a4210ccdaa5f..0cef8aa85c1d 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
@@ -254,95 +254,180 @@ static void i40e_partition_setting_complaint(struct 
i40e_pf *pf)
 /**
  * i40e_phy_type_to_ethtool - convert the phy_types to ethtool link modes
  * @pf: PF struct with phy_types
- * @supported: pointer to the ethtool supported variable to fill in
- * @advertising: pointer to the ethtool advertising variable to fill in
+ * @ks: ethtool link ksettings struct to fill out
  *
  **/
-static void i40e_phy_type_to_ethtool(struct i40e_pf *pf, u32 *supported,
-u32 *advertising)
+static void i40e_phy_type_to_ethtool(struct i40e_pf *pf,
+struct ethtool_link_ksettings *ks)
 {
struct i40e_link_status *hw_link_info = >hw.phy.link_info;
u64 phy_types = pf->hw.phy.phy_types;
 
-   *supported = 0x0;
-   *advertising = 0x0;
+   ethtool_link_ksettings_zero_link_mode(ks, supported);
+   ethtool_link_ksettings_zero_link_mode(ks, advertising);
 
if (phy_types & I40E_CAP_PHY_TYPE_SGMII) {
-   *supported |= SUPPORTED_1000baseT_Full;
+   ethtool_link_ksettings_add_link_mode(ks, supported,
+1000baseT_Full);
if (hw_link_info->requested_speeds & I40E_LINK_SPEED_1GB)
-   *advertising |= ADVERTISED_1000baseT_Full;
+   ethtool_link_ksettings_add_link_mode(ks, advertising,
+1000baseT_Full);
if (pf->hw_features & I40E_HW_100M_SGMII_CAPABLE) {
-   *supported |= SUPPORTED_100baseT_Full;
-   *advertising |= ADVERTISED_100baseT_Full;
+   ethtool_link_ksettings_add_link_mode(ks, supported,
+100baseT_Full);
+   ethtool_link_ksettings_add_link_mode(ks, advertising,
+100baseT_Full);
}
}
if (phy_types & I40E_CAP_PHY_TYPE_XAUI ||
phy_types & I40E_CAP_PHY_TYPE_XFI ||
phy_types & I40E_CAP_PHY_TYPE_SFI ||
phy_types & I40E_CAP_PHY_TYPE_10GBASE_SFPP_CU ||
-   phy_types & I40E_CAP_PHY_TYPE_10GBASE_AOC)
-   *supported |= SUPPORTED_1baseT_Full;
-   if (phy_types & I40E_CAP_PHY_TYPE_10GBASE_CR1_CU ||
-   phy_types & I40E_CAP_PHY_TYPE_10GBASE_CR1 ||
-   phy_types & I40E_CAP_PHY_TYPE_10GBASE_T ||
-   phy_types & I40E_CAP_PHY_TYPE_10GBASE_SR ||
-   phy_types & I40E_CAP_PHY_TYPE_10GBASE_LR) {
-   *supported |= SUPPORTED_1baseT_Full;
+   phy_types & I40E_CAP_PHY_TYPE_10GBASE_AOC) {
+   ethtool_link_ksettings_add_link_mode(ks, supported,
+1baseT_Full);
if (hw_link_info->requested_speeds & I40E_LINK_SPEED_10GB)
-   *advertising |= ADVERTISED_1baseT_Full;
+   ethtool_link_ksettings_add_link_mode(ks, advertising,
+1baseT_Full);
+   }
+   if (phy_types & I40E_CAP_PHY_TYPE_10GBASE_T) {
+   ethtool_link_ksettings_add_link_mode(ks, supported,
+1baseT_Full);
+   if (hw_link_info->requested_speeds & I40E_LINK_SPEED_10GB)
+   ethtool_link_ksettings_add_link_mode(ks, advertising,
+1baseT_Full);
}
if (phy_types & I40E_CAP_PHY_TYPE_XLAUI ||
phy_types & I40E_CAP_PHY_TYPE_XLPPI ||
phy_types & I40E_CAP_PHY_TYPE_40GBASE_AOC)
-   *supported |= SUPPORTED_4baseCR4_Full;
+   ethtool_link_ksettings_add_link_mode(ks, supported,
+

[net-next 01/15] i40e: rename 'cmd' variables in ethtool interface

2017-10-17 Thread Jeff Kirsher

From: Alan Brady 

After the switch to the new ethtool API, ethtool passes us
ethtool_ksettings structs instead of ethtool_command structs, however we
were still referring to them as 'cmd' variables.  This renames them to
'ks' variables which makes the code easier to understand.

Signed-off-by: Alan Brady 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_ethtool.c | 145 +
 1 file changed, 74 insertions(+), 71 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c 
b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
index 72d5f2cdf419..06514a76ff91 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
@@ -378,12 +378,12 @@ static void i40e_phy_type_to_ethtool(struct i40e_pf *pf, 
u32 *supported,
 /**
  * i40e_get_settings_link_up - Get the Link settings for when link is up
  * @hw: hw structure
- * @ecmd: ethtool command to fill in
+ * @ks: ethtool ksettings to fill in
  * @netdev: network interface device structure
- *
+ * @pf: pointer to physical function struct
  **/
 static void i40e_get_settings_link_up(struct i40e_hw *hw,
- struct ethtool_link_ksettings *cmd,
+ struct ethtool_link_ksettings *ks,
  struct net_device *netdev,
  struct i40e_pf *pf)
 {
@@ -394,9 +394,9 @@ static void i40e_get_settings_link_up(struct i40e_hw *hw,
u32 supported, advertising;
 
ethtool_convert_link_mode_to_legacy_u32(,
-   cmd->link_modes.supported);
+   ks->link_modes.supported);
ethtool_convert_link_mode_to_legacy_u32(,
-   cmd->link_modes.advertising);
+   ks->link_modes.advertising);
 
/* Initialize supported and advertised settings based on phy settings */
switch (hw_link_info->phy_type) {
@@ -528,48 +528,49 @@ static void i40e_get_settings_link_up(struct i40e_hw *hw,
/* Set speed and duplex */
switch (link_speed) {
case I40E_LINK_SPEED_40GB:
-   cmd->base.speed = SPEED_4;
+   ks->base.speed = SPEED_4;
break;
case I40E_LINK_SPEED_25GB:
 #ifdef SPEED_25000
-   cmd->base.speed = SPEED_25000;
+   ks->base.speed = SPEED_25000;
 #else
netdev_info(netdev,
"Speed is 25G, display not supported by this 
version of ethtool.\n");
 #endif
break;
case I40E_LINK_SPEED_20GB:
-   cmd->base.speed = SPEED_2;
+   ks->base.speed = SPEED_2;
break;
case I40E_LINK_SPEED_10GB:
-   cmd->base.speed = SPEED_1;
+   ks->base.speed = SPEED_1;
break;
case I40E_LINK_SPEED_1GB:
-   cmd->base.speed = SPEED_1000;
+   ks->base.speed = SPEED_1000;
break;
case I40E_LINK_SPEED_100MB:
-   cmd->base.speed = SPEED_100;
+   ks->base.speed = SPEED_100;
break;
default:
break;
}
-   cmd->base.duplex = DUPLEX_FULL;
+   ks->base.duplex = DUPLEX_FULL;
 
-   ethtool_convert_legacy_u32_to_link_mode(cmd->link_modes.supported,
+   ethtool_convert_legacy_u32_to_link_mode(ks->link_modes.supported,
supported);
-   ethtool_convert_legacy_u32_to_link_mode(cmd->link_modes.advertising,
+   ethtool_convert_legacy_u32_to_link_mode(ks->link_modes.advertising,
advertising);
 }
 
 /**
  * i40e_get_settings_link_down - Get the Link settings for when link is down
  * @hw: hw structure
- * @ecmd: ethtool command to fill in
+ * @ks: ethtool ksettings to fill in
+ * @pf: pointer to physical function struct
  *
  * Reports link settings that can be determined when link is down
  **/
 static void i40e_get_settings_link_down(struct i40e_hw *hw,
-   struct ethtool_link_ksettings *cmd,
+   struct ethtool_link_ksettings *ks,
struct i40e_pf *pf)
 {
u32 supported, advertising;
@@ -579,25 +580,25 @@ static void i40e_get_settings_link_down(struct i40e_hw 
*hw,
 */
i40e_phy_type_to_ethtool(pf, , );
 
-   ethtool_convert_legacy_u32_to_link_mode(cmd->link_modes.supported,
+   ethtool_convert_legacy_u32_to_link_mode(ks->link_modes.supported,
supported);
-

Re: pull-request: mac80211 2017-10-16

2017-10-17 Thread Jason A. Donenfeld

On Tue, Oct 17, 2017 at 7:46 AM, Johannes Berg
 wrote:
> If it's not equal, you execute so much code
> beneath, going to the driver etc., that I'd think this particular time
> is in the noise.

Usually presumptions like this get you in trouble when some crafty
academic has a smart idea about that noise. I'll send a patch.

Re: [patch net-next 27/34] nfp: bpf: Convert ndo_setup_tc offloads to block callbacks

2017-10-17 Thread Jiri Pirko

Tue, Oct 17, 2017 at 04:39:59PM CEST, jakub.kicin...@netronome.com wrote:
>On Tue, 17 Oct 2017 14:48:12 +0200, Jiri Pirko wrote:
>> Fri, Oct 13, 2017 at 03:08:24AM CEST, jakub.kicin...@netronome.com wrote:
>> >On Thu, 12 Oct 2017 19:18:16 +0200, Jiri Pirko wrote:  
>> >> diff --git a/drivers/net/ethernet/netronome/nfp/bpf/offload.c 
>> >> b/drivers/net/ethernet/netronome/nfp/bpf/offload.c
>> >> index a88bb5b..9e9af88 100644
>> >> --- a/drivers/net/ethernet/netronome/nfp/bpf/offload.c
>> >> +++ b/drivers/net/ethernet/netronome/nfp/bpf/offload.c
>> >> @@ -246,6 +246,10 @@ int nfp_net_bpf_offload(struct nfp_net *nn, struct 
>> >> tc_cls_bpf_offload *cls_bpf)
>> >>   void *code;
>> >>   int err;
>> >>  
>> >> + if (cls_bpf->common.protocol != htons(ETH_P_ALL) ||
>> >> + cls_bpf->common.chain_index)
>> >> + return -EOPNOTSUPP;
>> >> +
>> >>   max_instr = nn_readw(nn, NFP_NET_CFG_BPF_MAX_LEN);
>> >>  
>> >>   switch (cls_bpf->command) {  
>> >
>> >It is certainly very ugly but I send a fake struct tc_cls_bpf_offload
>> >here for XDP.  Refactoring this mess is pretty high on my priority list
>> >but one way or the other this function will be called from XDP so TC
>> >checks must stay in the TC handler... :(  
>> 
>> Okay. But currently, why is it a problem? You don't need the checks for
>> xdp path.
>> 
>
>static int
>nfp_bpf_xdp_offload(struct nfp_app *app, struct nfp_net *nn,
>   struct bpf_prog *prog)
>{
>   struct tc_cls_bpf_offload cmd = {
>   .prog = prog,
>   };
>   int ret;
>
>   if (!nfp_net_ebpf_capable(nn))
>   return -EINVAL;
>
>   if (nn->dp.ctrl & NFP_NET_CFG_CTRL_BPF) {
>   if (!nn->dp.bpf_offload_xdp)
>   return prog ? -EBUSY : 0;
>   cmd.command = prog ? TC_CLSBPF_REPLACE : TC_CLSBPF_DESTROY;
>   } else {
>   if (!prog)
>   return 0;
>   cmd.command = TC_CLSBPF_ADD;
>   }
>
>   ret = nfp_net_bpf_offload(nn, );
>   /* Stop offload if replace not possible */
>   if (ret && cmd.command == TC_CLSBPF_REPLACE)
>   nfp_bpf_xdp_offload(app, nn, NULL);
>   nn->dp.bpf_offload_xdp = prog && !ret;
>   return ret;
>}
>
>The fake offload struct is at the top of this function.  Dereferencing
>cls_bpf->common in nfp_net_bpf_offload() will crash the kernel.  Or am
>I missing something?

We just have to init it. Should not be a problem. Will add it.

[PATCH] net/ethernet/sgi: Code cleanup

2017-10-17 Thread Joshua Kinard

From: Joshua Kinard 

The below patch attempts to clean up the code for the in-tree driver
for IOC3 ethernet and serial console support, primarily used by SGI
MIPS platforms.  Notable changes include:

  - Lots of whitespace cleanup
  - Using shorthand integer types (u16, u32, etc) where appropriate
  - Moving the function name to the next line after type declarations
  - Using the multiline comment syntax preferred by the networking
subsystem
  - Wrapping some long lines to ~80 chars
  - Spelling/grammar fixes in comments
  - Other minor cleanups

Cc: Ralf Baechle 
Cc: linux-m...@linux-mips.org
Cc: netdev@vger.kernel.org
Signed-off-by: Joshua Kinard 
---
 drivers/net/ethernet/sgi/ioc3-eth.c |  425 ++
 1 file changed, 240 insertions(+), 185 deletions(-)

diff --git a/drivers/net/ethernet/sgi/ioc3-eth.c 
b/drivers/net/ethernet/sgi/ioc3-eth.c
index 9c0488e0f08e..ad014551923a 100644
--- a/drivers/net/ethernet/sgi/ioc3-eth.c
+++ b/drivers/net/ethernet/sgi/ioc3-eth.c
@@ -66,13 +66,13 @@
 #include 
 
 /*
- * 64 RX buffers.  This is tunable in the range of 16 <= x < 512.  The
- * value must be a power of two.
+ * 64 RX buffers.  This is tunable in the range of 16 <= x < 512.
+ * The value must be a power of two.
  */
 #define RX_BUFFS 64
 
-#define ETCSR_FD   ((17<midr_w)
 #define ioc3_w_midr_w(v)   do { ioc3->midr_w = cpu_to_be32(v); } while (0)
 
-static inline u32 mcr_pack(u32 pulse, u32 sample)
+static inline u32
+mcr_pack(u32 pulse, u32 sample)
 {
return (pulse << 10) | (sample << 2);
 }
 
-static int nic_wait(struct ioc3 *ioc3)
+static int
+nic_wait(struct ioc3 *ioc3)
 {
u32 mcr;
 
-do {
-mcr = ioc3_r_mcr();
-} while (!(mcr & 2));
+   do {
+   mcr = ioc3_r_mcr();
+   } while (!(mcr & 2));
 
-return mcr & 1;
+   return mcr & 1;
 }
 
-static int nic_reset(struct ioc3 *ioc3)
+static int
+nic_reset(struct ioc3 *ioc3)
 {
-int presence;
+   int presence;
 
ioc3_w_mcr(mcr_pack(500, 65));
presence = nic_wait(ioc3);
@@ -243,10 +248,11 @@ static int nic_reset(struct ioc3 *ioc3)
ioc3_w_mcr(mcr_pack(0, 500));
nic_wait(ioc3);
 
-return presence;
+   return presence;
 }
 
-static inline int

[PATCH net-next 3/3] ipv6: obsolete cached dst when removing them from fib tree

2017-10-17 Thread Paolo Abeni

The commit 2b760fcf5cfb ("ipv6: hook up exception table to store
dst cache") partially reverted 1e2ea8ad37be ("ipv6: set
dst.obsolete when a cached route has expired").

This change brings back the dst obsoleting and push it a step
farther: cached dst are always obsoleted when removed from the
fib tree, and removal by time expiration is now performed
regardless of dst->__refcnt, to be consistent with what we
already do for RTF_GATEWAY dst.

Fixes: 2b760fcf5cfb ("ipv6: hook up exception table to store dst cache")
Signed-off-by: Paolo Abeni 
---
 net/ipv6/route.c | 13 +++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 8b25a31b6b03..fce740049e3e 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -1147,6 +1147,12 @@ static void rt6_remove_exception(struct 
rt6_exception_bucket *bucket,
if (!bucket || !rt6_ex)
return;
 
+   /* sockets, flow cache, etc. can hold a refence to this dst, be sure
+* they will drop it.
+*/
+   if (rt6_ex->rt6i)
+   rt6_ex->rt6i->dst.obsolete = DST_OBSOLETE_FORCE_CHK;
+
net = dev_net(rt6_ex->rt6i->dst.dev);
rt6_ex->rt6i->rt6i_node = NULL;
hlist_del_rcu(_ex->hlist);
@@ -1575,8 +1581,11 @@ static void rt6_age_examine_exception(struct 
rt6_exception_bucket *bucket,
 {
struct rt6_info *rt = rt6_ex->rt6i;
 
-   if (atomic_read(>dst.__refcnt) == 1 &&
-   time_after_eq(now, rt->dst.lastuse + gc_args->timeout)) {
+   /* we are pruning and obsoleting the exception route even if others
+* have still reference to it, so that on next dst_check() such
+* reference can be dropped
+*/
+   if (time_after_eq(now, rt->dst.lastuse + gc_args->timeout)) {
RT6_TRACE("aging clone %p\n", rt);
rt6_remove_exception(bucket, rt6_ex);
return;
-- 
2.13.6

Re: [PATCH net-next] tcp: Enable TFO without a cookie on a per-socket basis

2017-10-17 Thread Christoph Paasch

On 17/10/17 - 10:26:58, Yuchung Cheng wrote:
> On Mon, Oct 16, 2017 at 11:37 PM, Christoph Paasch  wrote:
> > We already allow to enable TFO without a cookie by using the
> > fastopen-sysctl and setting it to TFO_SERVER_COOKIE_NOT_REQD (0x200).
> > This is safe to do in certain environments where we know that there
> > isn't a malicous host (aka., data-centers).
> >
> > A server however might be talking to both sides (public Internet and
> > data-center). So, this server would want to enable cookie-less TFO for
> > the connections that go to the data-center while enforcing cookies for
> > the traffic from the Internet.
> >
> > This patch exposes a socket-option to enable this (protected by
> > CAP_NET_ADMIN).
> >
> > Signed-off-by: Christoph Paasch 
> > ---
> >  include/linux/tcp.h  |  1 +
> >  include/uapi/linux/tcp.h |  1 +
> >  net/ipv4/tcp.c   | 14 ++
> >  net/ipv4/tcp_fastopen.c  |  6 --
> >  4 files changed, 20 insertions(+), 2 deletions(-)
> >
> > diff --git a/include/linux/tcp.h b/include/linux/tcp.h
> > index 1d2c44e09e31..cda5d4dc8d70 100644
> > --- a/include/linux/tcp.h
> > +++ b/include/linux/tcp.h
> > @@ -228,6 +228,7 @@ struct tcp_sock {
> > syn_fastopen_ch:1, /* Active TFO re-enabling probe */
> > syn_data_acked:1,/* data in SYN is acked by SYN-ACK */
> > save_syn:1, /* Save headers of SYN packet */
> > +   no_tfo_cookie:1, /* Allow send/recv SYN+data without a 
> > cookie */
> can we rename to fastopen_no_cookie and move one line above so TFO
> stuff is together with similar naming.

Sure, will rename & move.

> 
> > is_cwnd_limited:1;/* forward progress limited by snd_cwnd? 
> > */
> > u32 tlp_high_seq;   /* snd_nxt at the time of TLP retransmit. */
> >
> > diff --git a/include/uapi/linux/tcp.h b/include/uapi/linux/tcp.h
> > index 15c25eccab2b..d44f4bef056c 100644
> > --- a/include/uapi/linux/tcp.h
> > +++ b/include/uapi/linux/tcp.h
> > @@ -119,6 +119,7 @@ enum {
> >  #define TCP_FASTOPEN_CONNECT   30  /* Attempt FastOpen with connect */
> >  #define TCP_ULP31  /* Attach a ULP to a TCP 
> > connection */
> >  #define TCP_MD5SIG_EXT 32  /* TCP MD5 Signature with 
> > extensions */
> > +#define TCP_NO_TFO_COOKIE  33  /* Enable TFO without a TFO cookie 
> > */
> >
> >  struct tcp_repair_opt {
> > __u32   opt_code;
> > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> > index 3b34850d361f..88c90be12d9f 100644
> > --- a/net/ipv4/tcp.c
> > +++ b/net/ipv4/tcp.c
> > @@ -2821,6 +2821,16 @@ static int do_tcp_setsockopt(struct sock *sk, int 
> > level,
> > err = -EOPNOTSUPP;
> > }
> > break;
> > +   case TCP_NO_TFO_COOKIE:
> rename to TCP_FASTOPEN_NO_COOKIE for better consistency on TFO
> options?

Yes, I will rename.

> I am also cooking a TCP_FASTOPEN_KEY option patch to allow
> listener to update the key.

I see - nice!


Thanks,
Christoph

> 
> > +   if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
> > +   err = -EPERM;
> > +   else if (val > 1 || val < 0)
> > +   err = -EINVAL;
> > +   else if (!((1 << sk->sk_state) & (TCPF_CLOSE | 
> > TCPF_LISTEN)))
> > +   err = -EINVAL;
> > +   else
> > +   tp->no_tfo_cookie = 1;
> > +   break;
> > case TCP_TIMESTAMP:
> > if (!tp->repair)
> > err = -EPERM;
> > @@ -3219,6 +3229,10 @@ static int do_tcp_getsockopt(struct sock *sk, int 
> > level,
> > val = tp->fastopen_connect;
> > break;
> >
> > +   case TCP_NO_TFO_COOKIE:
> > +   val = tp->no_tfo_cookie;
> > +   break;
> > +
> > case TCP_TIMESTAMP:
> > val = tcp_time_stamp_raw() + tp->tsoffset;
> > break;
> > diff --git a/net/ipv4/tcp_fastopen.c b/net/ipv4/tcp_fastopen.c
> > index 7ee4aadcdd71..c1b00b666b43 100644
> > --- a/net/ipv4/tcp_fastopen.c
> > +++ b/net/ipv4/tcp_fastopen.c
> > @@ -309,7 +309,8 @@ struct sock *tcp_try_fastopen(struct sock *sk, struct 
> > sk_buff *skb,
> > return NULL;
> > }
> >
> > -   if (syn_data && (tcp_fastopen & TFO_SERVER_COOKIE_NOT_REQD))
> > +   if (syn_data && ((tcp_fastopen & TFO_SERVER_COOKIE_NOT_REQD) ||
> > +tcp_sk(sk)->no_tfo_cookie))
> > goto fastopen;
> >
> > if (foc->len >= 0 &&  /* Client presents or requests a cookie */
> > @@ -363,7 +364,8 @@ bool tcp_fastopen_cookie_check(struct sock *sk, u16 
> > *mss,
> > return false;
> > }
> >
> > -   if (sock_net(sk)->ipv4.sysctl_tcp_fastopen & TFO_CLIENT_NO_COOKIE) {
> > +   if ((sock_net(sk)->ipv4.sysctl_tcp_fastopen & TFO_CLIENT_NO_COOKIE) 
> > ||
> > +

Re: RFC(v2): Audit Kernel Container IDs

2017-10-17 Thread James Bottomley

On Tue, 2017-10-17 at 13:15 -0400, Steve Grubb wrote:
> On Tuesday, October 17, 2017 12:43:18 PM EDT Casey Schaufler wrote:
> > 
> > > 
> > > The idea is that processes spawned into a container would be
> > > labelled by the container orchestration system.  It's unclear
> > > what should happen to processes using nsenter after the fact, but
> > > policy for that should be up to the orchestration system.
> > 
> > I'm fine with that. The user space policy can be anything y'all
> > like.
> 
> I think there should be a login event.

I thought you wanted this for containers?  Container creation doesn't
have login events.  In an unprivileged orchestration system it may be
hard to synthetically manufacture them.

James

[PATCH net-next 2/3] ipv6: start fib6 gc on RTF_CACHE dst creation

2017-10-17 Thread Paolo Abeni

After the commit Fixes: 2b760fcf5cfb ("ipv6: hook up exception
table to store dst cache"), the fib6 gc is not started after
the creation of a RTF_CACHE via a redirect or pmtu update, since
fib6_add() isn't invoked anymore for such dsts.

We need the fib6 gc to run periodically to clean the RTF_CACHE,
or the dst will stay there forever.

Fix it by explicitly calling fib6_force_start_gc() on successful
exception creation. gc_args->more accounting will ensure that
the gc timer will run for whatever time needed to properly
clean the table.

Fixes: 2b760fcf5cfb ("ipv6: hook up exception table to store dst cache")
Signed-off-by: Paolo Abeni 
---
 net/ipv6/route.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 5bb53dbd4fd3..8b25a31b6b03 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -1340,8 +1340,10 @@ static int rt6_insert_exception(struct rt6_info *nrt,
spin_unlock_bh(_exception_lock);
 
/* Update fn->fn_sernum to invalidate all cached dst */
-   if (!err)
+   if (!err) {
fib6_update_sernum(ort);
+   fib6_force_start_gc(net);
+   }
 
return err;
 }
-- 
2.13.6

[PATCH net-next 1/3] ipv6: fix route cache dump

2017-10-17 Thread Paolo Abeni

After the commit 2b760fcf5cfb ("ipv6: hook up exception table to
store dst cache"), entries in the routing cache are not shown by:

ip route show cache

because the per route exception table containing such routes is not
traversed by rt6_dump_route().
Fix it by explicitly dumping all routes present into the
rt6i_exception_bucket.

Fixes: 2b760fcf5cfb ("ipv6: hook up exception table to store dst cache")
Signed-off-by: Paolo Abeni 
---
 net/ipv6/route.c | 30 ++
 1 file changed, 26 insertions(+), 4 deletions(-)

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 01a103c23a6c..5bb53dbd4fd3 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -4190,10 +4190,14 @@ static int rt6_fill_node(struct net *net,
return -EMSGSIZE;
 }
 
+/* this is called under the RCU lock */
 int rt6_dump_route(struct rt6_info *rt, void *p_arg)
 {
struct rt6_rtnl_dump_arg *arg = (struct rt6_rtnl_dump_arg *) p_arg;
+   struct rt6_exception_bucket *bucket;
+   struct rt6_exception *rt6_ex;
struct net *net = arg->net;
+   int err, port_id, seq, i;
 
if (rt == net->ipv6.ip6_null_entry)
return 0;
@@ -4209,10 +4213,28 @@ int rt6_dump_route(struct rt6_info *rt, void *p_arg)
}
}
 
-   return rt6_fill_node(net,
-arg->skb, rt, NULL, NULL, 0, RTM_NEWROUTE,
-NETLINK_CB(arg->cb->skb).portid, arg->cb->nlh->nlmsg_seq,
-NLM_F_MULTI);
+   /* dump execeptions table, if available */
+   port_id = NETLINK_CB(arg->cb->skb).portid;
+   seq = arg->cb->nlh->nlmsg_seq;
+   bucket = rcu_dereference(rt->rt6i_exception_bucket);
+   if (!bucket)
+   goto no_exceptions;
+
+   for (i = 0; i < FIB6_EXCEPTION_BUCKET_SIZE; i++) {
+   hlist_for_each_entry_rcu(rt6_ex, >chain, hlist) {
+   err = rt6_fill_node(net, arg->skb, rt6_ex->rt6i, NULL,
+   NULL, 0, RTM_NEWROUTE, port_id, seq,
+   NLM_F_MULTI);
+   if (err)
+   return err;
+   }
+
+   bucket++;
+   }
+
+no_exceptions:
+   return rt6_fill_node(net, arg->skb, rt, NULL, NULL, 0, RTM_NEWROUTE,
+port_id, seq, NLM_F_MULTI);
 }
 
 static int inet6_rtm_getroute(struct sk_buff *in_skb, struct nlmsghdr *nlh,
-- 
2.13.6

[PATCH net-next 0/3] ipv6: fixes for RTF_CACHE entries

2017-10-17 Thread Paolo Abeni

This series addresses 3 different but related issues with RTF_CACHE introduced
by the recent refactory.

patch 1 restore redirect and pmtu route update dump
patch 2 restore the gc timer for such routes
patch 3 obsoletes the dst on removal from exception tables

Paolo Abeni (3):
  ipv6: fix route cache dump
  ipv6: start fib6 gc on RTF_CACHE dst creation
  ipv6: obsolete cached dst when removing them from fib tree

 net/ipv6/route.c | 47 ---
 1 file changed, 40 insertions(+), 7 deletions(-)

-- 
2.13.6

[PATCH net-next 3/3] ibmvnic: Let users change net device features

2017-10-17 Thread Thomas Falcon

Signed-off-by: Thomas Falcon 
---
 drivers/net/ethernet/ibm/ibmvnic.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/ibm/ibmvnic.c 
b/drivers/net/ethernet/ibm/ibmvnic.c
index aedb81c..b991703 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -3019,6 +3019,8 @@ static void handle_query_ip_offload_rsp(struct 
ibmvnic_adapter *adapter)
if (buf->large_tx_ipv6)
adapter->netdev->features |= NETIF_F_TSO6;
 
+   adapter->netdev->hw_features |= adapter->netdev->features;
+
memset(, 0, sizeof(crq));
crq.control_ip_offload.first = IBMVNIC_CRQ_CMD;
crq.control_ip_offload.cmd = CONTROL_IP_OFFLOAD;
-- 
1.8.3.1

[PATCH net-next 2/3] ibmvnic: Enable TSO support

2017-10-17 Thread Thomas Falcon

This patch enables TSO support. It includes additional
buffers reserved exclusively for large packets. Throughput
is greatly increased with TSO enabled, from about 1 Gb/s to
9 Gb/s on our test systems.

Signed-off-by: Thomas Falcon 
---
 drivers/net/ethernet/ibm/ibmvnic.c | 56 --
 drivers/net/ethernet/ibm/ibmvnic.h |  5 
 2 files changed, 53 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/ibm/ibmvnic.c 
b/drivers/net/ethernet/ibm/ibmvnic.c
index b508877..aedb81c 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -553,6 +553,10 @@ static int reset_tx_pools(struct ibmvnic_adapter *adapter)
if (rc)
return rc;
 
+   rc = reset_long_term_buff(adapter, _pool->tso_ltb);
+   if (rc)
+   return rc;
+
memset(tx_pool->tx_buff, 0,
   adapter->req_tx_entries_per_subcrq *
   sizeof(struct ibmvnic_tx_buff));
@@ -562,6 +566,7 @@ static int reset_tx_pools(struct ibmvnic_adapter *adapter)
 
tx_pool->consumer_index = 0;
tx_pool->producer_index = 0;
+   tx_pool->tso_index = 0;
}
 
return 0;
@@ -581,6 +586,7 @@ static void release_tx_pools(struct ibmvnic_adapter 
*adapter)
tx_pool = >tx_pool[i];
kfree(tx_pool->tx_buff);
free_long_term_buff(adapter, _pool->long_term_buff);
+   free_long_term_buff(adapter, _pool->tso_ltb);
kfree(tx_pool->free_map);
}
 
@@ -625,6 +631,16 @@ static int init_tx_pools(struct net_device *netdev)
return -1;
}
 
+   /* alloc TSO ltb */
+   if (alloc_long_term_buff(adapter, _pool->tso_ltb,
+IBMVNIC_TSO_BUFS *
+IBMVNIC_TSO_BUF_SZ)) {
+   release_tx_pools(adapter);
+   return -1;
+   }
+
+   tx_pool->tso_index = 0;
+
tx_pool->free_map = kcalloc(adapter->req_tx_entries_per_subcrq,
sizeof(int), GFP_KERNEL);
if (!tx_pool->free_map) {
@@ -1201,10 +1217,21 @@ static int ibmvnic_xmit(struct sk_buff *skb, struct 
net_device *netdev)
be32_to_cpu(adapter->login_rsp_buf->off_txsubm_subcrqs));
 
index = tx_pool->free_map[tx_pool->consumer_index];
-   offset = index * adapter->req_mtu;
-   dst = tx_pool->long_term_buff.buff + offset;
-   memset(dst, 0, adapter->req_mtu);
-   data_dma_addr = tx_pool->long_term_buff.addr + offset;
+
+   if (skb_is_gso(skb)) {
+   offset = tx_pool->tso_index * IBMVNIC_TSO_BUF_SZ;
+   dst = tx_pool->tso_ltb.buff + offset;
+   memset(dst, 0, IBMVNIC_TSO_BUF_SZ);
+   data_dma_addr = tx_pool->tso_ltb.addr + offset;
+   tx_pool->tso_index++;
+   if (tx_pool->tso_index == IBMVNIC_TSO_BUFS)
+   tx_pool->tso_index = 0;
+   } else {
+   offset = index * adapter->req_mtu;
+   dst = tx_pool->long_term_buff.buff + offset;
+   memset(dst, 0, adapter->req_mtu);
+   data_dma_addr = tx_pool->long_term_buff.addr + offset;
+   }
 
if (skb_shinfo(skb)->nr_frags) {
int cur, i;
@@ -1245,7 +1272,10 @@ static int ibmvnic_xmit(struct sk_buff *skb, struct 
net_device *netdev)
tx_crq.v1.n_sge = 1;
tx_crq.v1.flags1 = IBMVNIC_TX_COMP_NEEDED;
tx_crq.v1.correlator = cpu_to_be32(index);
-   tx_crq.v1.dma_reg = cpu_to_be16(tx_pool->long_term_buff.map_id);
+   if (skb_is_gso(skb))
+   tx_crq.v1.dma_reg = cpu_to_be16(tx_pool->tso_ltb.map_id);
+   else
+   tx_crq.v1.dma_reg = cpu_to_be16(tx_pool->long_term_buff.map_id);
tx_crq.v1.sge_len = cpu_to_be32(skb->len);
tx_crq.v1.ioba = cpu_to_be64(data_dma_addr);
 
@@ -1270,6 +1300,11 @@ static int ibmvnic_xmit(struct sk_buff *skb, struct 
net_device *netdev)
tx_crq.v1.flags1 |= IBMVNIC_TX_CHKSUM_OFFLOAD;
hdrs += 2;
}
+   if (skb_is_gso(skb)) {
+   tx_crq.v1.flags1 |= IBMVNIC_TX_LSO;
+   tx_crq.v1.mss = cpu_to_be16(skb_shinfo(skb)->gso_size);
+   hdrs += 2;
+   }
/* determine if l2/3/4 headers are sent to firmware */
if ((*hdrs >> 7) & 1 &&
(skb->protocol == htons(ETH_P_IP) ||
@@ -2960,10 +2995,10 @@ static void handle_query_ip_offload_rsp(struct 
ibmvnic_adapter *adapter)
adapter->ip_offload_ctrl.udp_ipv4_chksum = buf->udp_ipv4_chksum;
adapter->ip_offload_ctrl.tcp_ipv6_chksum = buf->tcp_ipv6_chksum;
adapter->ip_offload_ctrl.udp_ipv6_chksum = buf->udp_ipv6_chksum;
+

[PATCH net-next 1/3] ibmvnic: Enable scatter-gather support

2017-10-17 Thread Thomas Falcon

This patch enables scatter gather support. Since there is no
HW/FW scatter-gather support at this time, the driver needs to
loop through each fragment and copy it to a contiguous, pre-mapped
buffer entry.

Signed-off-by: Thomas Falcon 
---
 drivers/net/ethernet/ibm/ibmvnic.c | 23 +--
 1 file changed, 21 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/ibm/ibmvnic.c 
b/drivers/net/ethernet/ibm/ibmvnic.c
index 4bc14a9..b508877 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -1204,9 +1204,28 @@ static int ibmvnic_xmit(struct sk_buff *skb, struct 
net_device *netdev)
offset = index * adapter->req_mtu;
dst = tx_pool->long_term_buff.buff + offset;
memset(dst, 0, adapter->req_mtu);
-   skb_copy_from_linear_data(skb, dst, skb->len);
data_dma_addr = tx_pool->long_term_buff.addr + offset;
 
+   if (skb_shinfo(skb)->nr_frags) {
+   int cur, i;
+
+   /* Copy the head */
+   skb_copy_from_linear_data(skb, dst, skb_headlen(skb));
+   cur = skb_headlen(skb);
+
+   /* Copy the frags */
+   for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
+   const skb_frag_t *frag = _shinfo(skb)->frags[i];
+
+   memcpy(dst + cur,
+  page_address(skb_frag_page(frag)) +
+  frag->page_offset, skb_frag_size(frag));
+   cur += skb_frag_size(frag);
+   }
+   } else {
+   skb_copy_from_linear_data(skb, dst, skb->len);
+   }
+
tx_pool->consumer_index =
(tx_pool->consumer_index + 1) %
adapter->req_tx_entries_per_subcrq;
@@ -2948,7 +2967,7 @@ static void handle_query_ip_offload_rsp(struct 
ibmvnic_adapter *adapter)
adapter->ip_offload_ctrl.large_rx_ipv4 = 0;
adapter->ip_offload_ctrl.large_rx_ipv6 = 0;
 
-   adapter->netdev->features = NETIF_F_GSO;
+   adapter->netdev->features = NETIF_F_SG | NETIF_F_GSO;
 
if (buf->tcp_ipv4_chksum || buf->udp_ipv4_chksum)
adapter->netdev->features |= NETIF_F_IP_CSUM;
-- 
1.8.3.1

1 2 3 >

1 - 100 of 267 matches

Mail list logo