date:20160204

Re: [PATCH net] sctp: translate network order to host order when users get a hmacid

2016-02-04 Thread Marcelo Ricardo Leitner

On Wed, Feb 03, 2016 at 11:33:30PM +0800, Xin Long wrote:
> Commit ed5a377d87dc ("sctp: translate host order to network order when
> setting a hmacid") corrected the hmacid byte-order when setting a hmacid.
> but the same issue also exists on getting a hmacid.
> 
> We fix it by changing hmacids to host order when users get them with
> getsockopt.
> 
> Fixes: Commit ed5a377d87dc ("sctp: translate host order to network order when 
> setting a hmacid")
> Signed-off-by: Xin Long 

Acked-by: Marcelo Ricardo Leitner 

> ---
>  net/sctp/socket.c | 9 +++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/net/sctp/socket.c b/net/sctp/socket.c
> index 5ca2ebf..e878da0 100644
> --- a/net/sctp/socket.c
> +++ b/net/sctp/socket.c
> @@ -5538,6 +5538,7 @@ static int sctp_getsockopt_hmac_ident(struct sock *sk, 
> int len,
>   struct sctp_hmac_algo_param *hmacs;
>   __u16 data_len = 0;
>   u32 num_idents;
> + int i;
>  
>   if (!ep->auth_enable)
>   return -EACCES;
> @@ -,8 +5556,12 @@ static int sctp_getsockopt_hmac_ident(struct sock *sk, 
> int len,
>   return -EFAULT;
>   if (put_user(num_idents, >shmac_num_idents))
>   return -EFAULT;
> - if (copy_to_user(p->shmac_idents, hmacs->hmac_ids, data_len))
> - return -EFAULT;
> + for (i = 0; i < num_idents; i++) {
> + __u16 hmacid = ntohs(hmacs->hmac_ids[i]);
> +
> + if (copy_to_user(>shmac_idents[i], , sizeof(__u16)))
> + return -EFAULT;
> + }
>   return 0;
>  }
>  
> -- 
> 2.1.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

Re: [PATCH v3 net-next] net: Implement fast csum_partial for x86_64

2016-02-04 Thread Ingo Molnar


* Ingo Molnar  wrote:

> s/!CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
> 
> > +
> > +   /* Check length */
> > +10:cmpl$8, %esi
> > +   jg  30f
> > +   jl  20f
> > +
> > +   /* Exactly 8 bytes length */
> > +   addl(%rdi), %eax
> > +   adcl4(%rdi), %eax
> > +   RETURN
> > +
> > +   /* Less than 8 bytes length */
> > +20:clc
> > +   jmpq *branch_tbl_len(, %rsi, 8)
> > +
> > +   /* Greater than 8 bytes length. Determine number of quads (n). Sum
> > +* over first n % 16 quads
> > +*/
> > +30:movl%esi, %ecx
> > +   shrl$3, %ecx
> > +   andl$0xf, %ecx
> > +   negq%rcx
> > +   lea 40f(, %rcx, 4), %r11
> > +   clc
> > +   jmp *%r11
> 
> Are you absolutely sure that a jump table is the proper solution here? It 
> defeats branch prediction on most x86 uarchs. Why not label the loop stages 
> and 
> jump in directly with a binary tree of branches?

So just to expand on this a bit, attached below is a quick & simple & stupid 
testcase that generates a 16 entries call table. (Indirect jumps and indirect 
calls are predicted similarly on most x86 uarchs.) Just built it with:

  gcc -Wall -O2 -o jump-table jump-table.c

Even on relatively modern x86 uarchs (I ran this on a post Nehalem IVB Intel 
CPU 
and also on AMD Opteron. The numbers are from the Intel box.) this gives a high 
branch miss rate with a 16 entries jump table:

 triton:~> taskset 1 perf stat --repeat 10 ./jump-table 16
 ... using 16 jump table entries.
 ... using 1 loop iterations.
 ... result: 10001
 [...]

 Performance counter stats for './jump-table 16' (10 runs):

   1386.131780  task-clock (msec) #1.001 CPUs utilized  
  ( +-  0.18% )
33  context-switches  #0.024 K/sec  
  ( +-  1.71% )
 0  cpu-migrations#0.000 K/sec  

52  page-faults   #0.037 K/sec  
  ( +-  0.71% )
 6,247,215,683  cycles#4.507 GHz
  ( +-  0.18% )
 3,895,337,877  stalled-cycles-frontend   #   62.35% frontend cycles 
idle ( +-  0.30% )
 1,404,014,996  instructions  #0.22  insns per cycle

  #2.77  stalled cycles per 
insn  ( +-  0.02% )
   300,820,988  branches  #  217.022 M/sec  
  ( +-  0.02% )
87,518,741  branch-misses #   29.09% of all branches
  ( +-  0.01% )

   1.385240076 seconds time elapsed 
 ( +-  0.21% )

... as you can see the branch miss rate is very significant, causing a stalled 
decoder and very low instruction throughput.

I have to reduce the jump table to a single entry (!) to get good performance 
on 
this CPU:

 Performance counter stats for './jump-table 1' (10 runs):

739.173505  task-clock (msec) #1.001 CPUs utilized  
  ( +-  0.26% )
37  context-switches  #0.051 K/sec  
  ( +- 16.79% )
 0  cpu-migrations#0.000 K/sec  

52  page-faults   #0.070 K/sec  
  ( +-  0.41% )
 3,331,328,405  cycles#4.507 GHz
  ( +-  0.26% )
 2,012,973,596  stalled-cycles-frontend   #   60.43% frontend cycles 
idle ( +-  0.47% )
 1,403,880,792  instructions  #0.42  insns per cycle

  #1.43  stalled cycles per 
insn  ( +-  0.05% )
   300,817,064  branches  #  406.964 M/sec  
  ( +-  0.05% )
12,177  branch-misses #0.00% of all branches
  ( +- 12.39% )

   0.738616356 seconds time elapsed 
 ( +-  0.26% )

Note how the runtime got halved: that is because stalls got halved and the 
instructions per cycle throughput doubled.

Even a two entries jump table performs poorly:

 Performance counter stats for './jump-table 2' (10 runs):

   1493.790686  task-clock (msec) #1.001 CPUs utilized  
  ( +-  0.06% )
39  context-switches  #0.026 K/sec  
  ( +-  4.73% )
 0  cpu-migrations#0.000 K/sec  

52  page-faults   #0.035 K/sec  
  ( +-  0.26% )
 6,732,372,612  cycles#4.507 GHz
  ( +-  0.06% )
 4,229,130,302  stalled-cycles-frontend   #   62.82% frontend cycles 
idle ( +-  0.09% )
 1,407,803,145  instructions  #0.21  insns per cycle

Re: Keystone 2 boards boot failure

2016-02-04 Thread Grygorii Strashko

Hi Arnd,

On 02/03/2016 10:40 PM, Arnd Bergmann wrote:
> On Wednesday 03 February 2016 18:31:00 Grygorii Strashko wrote:
>> On 02/03/2016 06:20 PM, Arnd Bergmann wrote:
>>> On Wednesday 03 February 2016 16:21:05 Grygorii Strashko wrote:
 On 02/03/2016 04:11 PM, Franklin S Cooper Jr. wrote:
> On 02/02/2016 07:19 PM, Franklin S Cooper Jr. wrote:
>>>
>>> This looks wrong: I was getting the build warnings originally
>>> because of 64-bit dma_addr_t, and that should be the only way that
>>> this driver can operate, because in some configurations on keystone
>>> there is no memory below 4GB, and there is no dma-ranges property
>>> in the DT that shifts around the start of the DMA addresses.
>>
>> see keystone.dtsi:
>>  soc {
>>  #address-cells = <1>;
>>  #size-cells = <1>;
>>  compatible = "ti,keystone","simple-bus";
>>  interrupt-parent = <>;
>>  ranges = <0x0 0x0 0x0 0xc000>;
>>  dma-ranges = <0x8000 0x8 0x 0x8000>;
>>  ^^^
> 
> You are right, I totally missed it when I looked again. I thought it
> was correct but then couldn't find it in the dts.
> 
>> config:
>>
>> CONFIG_ARCH_PHYS_ADDR_T_64BIT=y
>> CONFIG_PHYS_ADDR_T_64BIT=y
>>
>> and
>>
>> #ifdef CONFIG_ARCH_DMA_ADDR_T_64BIT <--- should not be defined for KS2
>> typedef u64 dma_addr_t;
>> #else
>> typedef u32 dma_addr_t;
>> #endif
>>
>> Above is valid configuration for Keystone 2 with LPAE=y
> 
> Ok, but what do you mean with "should not be defined"? It clearly is
> defined in any multiplatform configuration that enables another platform
> needing 64-bit dma_addr_t.
> 

Then we probably have bigger problem :) KS2 will not work as is with
such configuration and not only KS2 - LPAE is enabled for TI DRA7 also.

Problem here is that dma_addr_t is used to fill DMA controllers data or can be 
written directly to register, so all drivers need to be revised which was 
initially
created for 32-bit HW and with assumption that dma_addr_t is 32-bits.

Also, I'm not sure that it will be possible to support both LE/BE in such case.

Actually, I've tried current multi_v7_defconfig and can see:
# CONFIG_ARCH_PHYS_ADDR_T_64BIT is not set
# CONFIG_PHYS_ADDR_T_64BIT is not set
# CONFIG_ARM_LPAE is not set
What is your "multiplatform configuration" ??

So I propose to fix this regression first (as we both did - revert changes in 
get/set_pad_info()) and have KS2 working again with current version of
defconfig files (keystone_defconfig & multi_v7_defconfig) while this discussion 
is continuing. 

-- 
regards,
-grygorii

Re: [PATCH net-next v5 1/2] ethtool: add speed/duplex validation functions

2016-02-04 Thread Nikolay Aleksandrov

On 02/04/2016 12:32 AM, Stephen Hemminger wrote:
> On Wed,  3 Feb 2016 04:04:36 +0100
> Nikolay Aleksandrov  wrote:
> 
>>  
>> +static inline int ethtool_validate_speed(__u32 speed)
>> +{
> 
> 
> No need for inline.
> 
This is defined in a header, if it's not inline you start getting
"defined but not used" warnings.

> But why check for valid value at all. At some point in the
> future, there will be yet another speed adopted by some standard body
> and the switch statement would need another value.
> 
> Why not accept any value? This is a virtual device.
> 
It was moved near the defined values so everyone adding a new speed would
remember to update the validation function as well. That being said, I don't
object to being able to set any custom speed to the virtio_net device especially
when there're physical devices that can have speeds outside of these defines.

Michael do you have any objections if I respin without the speed validation ?

Thanks,
 Nik

Re: [PATCH net v4] r8169: Disabling multiple invocation on rtl_try_msi function.

2016-02-04 Thread Corcodel Marian

On Thu, 2016-02-04 at 11:35 +0200, Corcodel Marian wrote:
> This patch set MSI on probe stage without this patch MSI is set when
> rtl_open occur.
>  Do not need to run rtl_try_msi on multiple times.
> 
> Signed-off-by: Corcodel Marian 
> ---
>  drivers/net/ethernet/realtek/r8169.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/realtek/r8169.c
> b/drivers/net/ethernet/realtek/r8169.c
> index 30eed0d..b0b43b7 100644
> --- a/drivers/net/ethernet/realtek/r8169.c
> +++ b/drivers/net/ethernet/realtek/r8169.c
> @@ -8248,6 +8248,9 @@ static int rtl_init_one(struct pci_dev *pdev,
> const struct pci_device_id *ent)
>  
>   /* Identify chip attached to board */
>   rtl8169_get_mac_version(tp, dev, cfg->default_ver);
> + RTL_W8(Cfg9346, Cfg9346_Unlock);
> + tp->features = rtl_try_msi(tp, cfg);
> + RTL_W8(Cfg9346, Cfg9346_Lock);
>  
>   rtl_init_rxcfg(tp);
>  
> @@ -8303,7 +8306,6 @@ static int rtl_init_one(struct pci_dev *pdev,
> const struct pci_device_id *ent)
>   }
>   if ((RTL_R8(Config5) & (UWF | BWF | MWF)) != 0)
>   tp->features |= RTL_FEATURE_WOL;
> - tp->features |= rtl_try_msi(tp, cfg);
>   RTL_W8(Cfg9346, Cfg9346_Lock);
>  
>   if (rtl_tbi_enabled(tp)) {


Sorry here patch is good but explications is bad.
Is about set first order to run rtl_try_msi and others.

Re: [PATCH net-next v5 2/2] virtio_net: add ethtool support for set and get of settings

2016-02-04 Thread Michael S. Tsirkin

On Wed, Feb 03, 2016 at 04:04:37AM +0100, Nikolay Aleksandrov wrote:
> From: Nikolay Aleksandrov 
> 
> This patch allows the user to set and retrieve speed and duplex of the
> virtio_net device via ethtool. Having this functionality is very helpful
> for simulating different environments and also enables the virtio_net
> device to participate in operations where proper speed and duplex are
> required (e.g. currently bonding lacp mode requires full duplex). Custom
> speed and duplex are not allowed, the user-supplied settings are validated
> before applying.
> 
> Example:
> $ ethtool eth1
> Settings for eth1:
> ...
>   Speed: Unknown!
>   Duplex: Unknown! (255)
> $ ethtool -s eth1 speed 1000 duplex full
> $ ethtool eth1
> Settings for eth1:
> ...
>   Speed: 1000Mb/s
>   Duplex: Full
> 
> Based on a patch by Roopa Prabhu.
> 
> Signed-off-by: Nikolay Aleksandrov 
> ---
> v2: use the new ethtool speed/duplex validation functions and allow half
> duplex to be set
> v3: return error if the user tries to change anything besides speed/duplex
> as per Michael's comment
> We have to zero-out advertising as it gets set automatically by ethtool if
> setting speed and duplex together.
> v4: Set port type to PORT_OTHER
> v5: null diff1.port because we set cmd->port now and ethtool returns it in
> the set request, retested all cases
> 
>  drivers/net/virtio_net.c | 60 
> 
>  1 file changed, 60 insertions(+)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 767ab11a6e9f..c9fd52a8e6ec 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -146,6 +146,10 @@ struct virtnet_info {
>   virtio_net_ctrl_ack ctrl_status;
>   u8 ctrl_promisc;
>   u8 ctrl_allmulti;
> +
> + /* Ethtool settings */
> + u8 duplex;
> + u32 speed;
>  };
>  
>  struct padded_vnet_hdr {
> @@ -1376,6 +1380,58 @@ static void virtnet_get_channels(struct net_device 
> *dev,
>   channels->other_count = 0;
>  }
>  
> +/* Check if the user is trying to change anything besides speed/duplex */
> +static bool virtnet_validate_ethtool_cmd(const struct ethtool_cmd *cmd)
> +{
> + struct ethtool_cmd diff1 = *cmd;
> + struct ethtool_cmd diff2 = {};
> +
> + /* advertising and cmd are usually set, ignore port because we set it */

We set it where?
Instead of this, should not we set diff2.port to PORT_OTHER?

> + ethtool_cmd_speed_set(, 0);
> + diff1.advertising = 0;
> + diff1.duplex = 0;
> + diff1.port = 0;
> + diff1.cmd = 0;
> +
> + return !memcmp(, , sizeof(diff1));
> +}
> +
> +static int virtnet_set_settings(struct net_device *dev, struct ethtool_cmd 
> *cmd)
> +{
> + struct virtnet_info *vi = netdev_priv(dev);
> + u32 speed;
> +
> + speed = ethtool_cmd_speed(cmd);
> + /* don't allow custom speed and duplex */
> + if (!ethtool_validate_speed(speed) ||
> + !ethtool_validate_duplex(cmd->duplex) ||
> + !virtnet_validate_ethtool_cmd(cmd))
> + return -EINVAL;
> + vi->speed = speed;
> + vi->duplex = cmd->duplex;
> +
> + return 0;
> +}
> +
> +static int virtnet_get_settings(struct net_device *dev, struct ethtool_cmd 
> *cmd)
> +{
> + struct virtnet_info *vi = netdev_priv(dev);
> +
> + ethtool_cmd_speed_set(cmd, vi->speed);
> + cmd->duplex = vi->duplex;
> + cmd->port = PORT_OTHER;
> +
> + return 0;
> +}
> +
> +static void virtnet_init_settings(struct net_device *dev)
> +{
> + struct virtnet_info *vi = netdev_priv(dev);
> +
> + vi->speed = SPEED_UNKNOWN;
> + vi->duplex = DUPLEX_UNKNOWN;
> +}
> +
>  static const struct ethtool_ops virtnet_ethtool_ops = {
>   .get_drvinfo = virtnet_get_drvinfo,
>   .get_link = ethtool_op_get_link,
> @@ -1383,6 +1439,8 @@ static const struct ethtool_ops virtnet_ethtool_ops = {
>   .set_channels = virtnet_set_channels,
>   .get_channels = virtnet_get_channels,
>   .get_ts_info = ethtool_op_get_ts_info,
> + .get_settings = virtnet_get_settings,
> + .set_settings = virtnet_set_settings,
>  };
>  
>  #define MIN_MTU 68
> @@ -1855,6 +1913,8 @@ static int virtnet_probe(struct virtio_device *vdev)
>   netif_set_real_num_tx_queues(dev, vi->curr_queue_pairs);
>   netif_set_real_num_rx_queues(dev, vi->curr_queue_pairs);
>  
> + virtnet_init_settings(dev);
> +
>   err = register_netdev(dev);
>   if (err) {
>   pr_debug("virtio_net: registering device failed\n");
> -- 
> 2.4.3

Re: [PATCH net-next v5 1/2] ethtool: add speed/duplex validation functions

2016-02-04 Thread Michael S. Tsirkin

On Wed, Feb 03, 2016 at 03:49:04PM -0800, Rick Jones wrote:
> On 02/03/2016 03:32 PM, Stephen Hemminger wrote:
> 
> >But why check for valid value at all. At some point in the
> >future, there will be yet another speed adopted by some standard body
> >and the switch statement would need another value.
> >
> >Why not accept any value? This is a virtual device.
> >
> 
> And even for not-quite-virtual devices - such as a VC/FlexNIC in an HPE
> blade server there can be just about any speed set.  I think we went down a
> path of patching some things to address that many years ago.  It would be a
> shame to undo that.
> 
> rick

I'm not sure I understand. The question is in defining the UAPI.
We currently have:

 * @speed: Low bits of the speed
 * @speed_hi: Hi bits of the speed

with the assumption that all values come from the defines.

So if we allow any value here we need to define what it means.

If the following is acceptable, then we can drop
most of validation:


--->
ethtool: future-proof interface for speed extensions

Many virtual and not quite virtual devices allow any speed to be set
through ethtool. Document this fact to make sure people don't assume the
enum lists all possible values.  Reserve values greater than INT_MAX for
future extension and to avoid conflict with SPEED_UNKNOWN.

Signed-off-by: Michael S. Tsirkin 



diff --git a/include/uapi/linux/ethtool.h b/include/uapi/linux/ethtool.h
index 57fa390..9462844 100644
--- a/include/uapi/linux/ethtool.h
+++ b/include/uapi/linux/ethtool.h
@@ -31,7 +31,7 @@
  * physical connectors and other link features that are
  * advertised through autonegotiation or enabled for
  * auto-detection.
- * @speed: Low bits of the speed
+ * @speed: Low bits of the speed, 1Mb units, 0 to INT_MAX or SPEED_UNKNOWN
  * @duplex: Duplex mode; one of %DUPLEX_*
  * @port: Physical connector type; one of %PORT_*
  * @phy_address: MDIO address of PHY (transceiver); 0 or 255 if not
@@ -47,7 +47,7 @@
  * obsoleted by  ethtool_coalesce.  Read-only; deprecated.
  * @maxrxpkt: Historically used to report RX IRQ coalescing; now
  * obsoleted by  ethtool_coalesce.  Read-only; deprecated.
- * @speed_hi: High bits of the speed
+ * @speed_hi: High bits of the speed, 1Mb units, 0 to INT_MAX or SPEED_UNKNOWN
  * @eth_tp_mdix: Ethernet twisted-pair MDI(-X) status; one of
  * %ETH_TP_MDI_*.  If the status is unknown or not applicable, the
  * value will be %ETH_TP_MDI_INVALID.  Read-only.
@@ -1303,7 +1303,7 @@ enum ethtool_sfeatures_retval_bits {
  * it was forced up into this mode or autonegotiated.
  */
 
-/* The forced speed, 10Mb, 100Mb, gigabit, [2.5|5|10|20|25|40|50|56|100]GbE. */
+/* The forced speed, in units of 1Mb. All values 0 to INT_MAX are legal. */
 #define SPEED_10   10
 #define SPEED_100  100
 #define SPEED_1000 1000


-- 
MST

Re: [PATCH net] sctp: do sanity checks before migrating the asoc

2016-02-04 Thread Marcelo Ricardo Leitner

On Wed, Feb 03, 2016 at 05:13:25PM +0100, Dmitry Vyukov wrote:
> On Tue, Jan 19, 2016 at 9:08 PM, Marcelo Ricardo Leitner
>  wrote:
> > Em 19-01-2016 17:55, Vlad Yasevich escreveu:
> >>
> >> On 01/19/2016 02:31 PM, Marcelo Ricardo Leitner wrote:
> >>>
> >>> Em 19-01-2016 16:37, Vlad Yasevich escreveu:
> 
>  On 01/19/2016 10:59 AM, Marcelo Ricardo Leitner wrote:
> >
> > Yes, not thrilled here either about connect-to-self.
> >
> > But there is a big difference on how both works. For rx we can just
> > look for wanted skbs
> > in rx queue, as they aren't going anywhere, but for tx I don't think we
> > can easily block
> > sctp_wfree() call because that may be happening on another CPU (or am I
> > mistaken here?
> > sctp still doesn't have RFS but even irqbalance could affect this
> > AFAICT) and more than
> > one skb may be in transit at a time.
> 
> 
>  The way it's done now, we wouldn't have to block sctp_wfree.  Chunks are
>  released under
>  lock when they are acked, so we are OK here.  The tx completions will
>  just put 1 byte back
>  to the socket associated with the tx'ed skb, and that should still be ok
>  as
>  sctp_packet_release_owner will call sk_free().
> >>>
> >>>
> >>> Please let me rephrase it. I'm actually worried about the asoc->base.sk
> >>> part of the story
> >>> and how it's fetched in sctp_wfree(). I think we can update that sk
> >>> pointer after
> >>> sock_wfree() has fetched it but not used it yet, possibly leading to
> >>> accounting it twice,
> >>> one during migration and one on sock_wfree.
> >>> In sock_wfree() it will update some sk stats like sk->sk_wmem_alloc,
> >>> among others.
> >>
> >>
> >> sctp_wfree() is only used on skbs that were created as sctp chunks to be
> >> transmitted.
> >> Right now, these skbs aren't actually submitted to the IP or to nic to be
> >> transmitted.
> >> They are queued at the association level (either in transports or in the
> >> outqueue).
> >> They are only freed during ACK processing.
> >>
> >> The ACK processing happens under a socket lock and thus asoc->base.sk can
> >> not move.
> >>
> >> The migration process also happens under a socket lock.  As a result,
> >> during migration
> >> we are guaranteed the chunk queues remain consistent and that
> >> asoc->base.sk linkage
> >> remains consistent.  In fact, if you look at the sctp_sock_migrate, we
> >> lock both
> >> sockets when we reassign the assoc->base.sk so we know both sockets are
> >> properly locked.
> >>
> >> So, I am not sure that what you are worried about can happen.  Please feel
> >> free to
> >> double-check the above of course.
> >
> >
> > Ohh, right. That makes sense. I'll rework the patch. Thanks Vlad.
> 
> 
> Hi Marcelo,
> 
> Any updates on this? I still see the leak.

Hi Dmitry,

No, not yet, and I'll be out for 3 weeks starting monday. So if I don't
get it by sunday, it will be a while, sorry.

  Marcelo

Re: net/tipc: memory leak in tipc_release

2016-02-04 Thread Dmitry Vyukov

On Thu, Dec 31, 2015 at 11:35 AM, Dmitry Vyukov  wrote:
> Hello,
>
> The following program, if run a parallel loop, leads to a leak of 2
> objects allocated in tipc_release:
>
> // autogenerated by syzkaller (http://github.com/google/syzkaller)
> #include 
> #include 
> #include 
> #include 
> #include 
>
> long r[86];
>
> int main()
> {
> memset(r, -1, sizeof(r));
> r[0] = syscall(SYS_mmap, 0x2000ul, 0x11000ul, 0x3ul,
> 0x32ul, 0xul, 0x0ul);
> r[1] = syscall(SYS_eventfd, 0x7ul, 0, 0, 0, 0, 0);
> r[2] = syscall(SYS_close, r[1], 0, 0, 0, 0, 0);
> r[3] = syscall(SYS_socket, 0x1eul, 0x2ul, 0x0ul, 0, 0, 0);
> r[4] = syscall(SYS_io_setup, 0x5ul, 0x20001d8bul, 0, 0, 0, 0);
> if (r[4] != -1)
> r[5] = *(uint64_t*)0x20001d8b;
> r[6] = syscall(SYS_fcntl, r[1], 0x406ul, r[3], 0, 0, 0);
> *(uint16_t*)0x20007000 = (uint16_t)0x27;
> *(uint32_t*)0x20007002 = (uint32_t)0x3;
> *(uint32_t*)0x20007006 = (uint32_t)0x6;
> *(uint32_t*)0x2000700a = (uint32_t)0x1;
> r[11] = syscall(SYS_connect, r[6], 0x20007000ul, 0x10ul, 0, 0, 0);
> r[12] = syscall(SYS_dup3, r[6], r[1], 0x8ul, 0, 0, 0);
> *(uint64_t*)0x20002000 = (uint64_t)0x20002fc0;
> *(uint64_t*)0x20002008 = (uint64_t)0x20002fd8;
> *(uint64_t*)0x20002010 = (uint64_t)0x2000246d;
> *(uint64_t*)0x20002fc0 = (uint64_t)0x8;
> *(uint32_t*)0x20002fc8 = (uint32_t)0x0;
> *(uint32_t*)0x20002fcc = (uint32_t)0x9;
> *(uint16_t*)0x20002fd0 = (uint16_t)0x5;
> *(uint16_t*)0x20002fd2 = (uint16_t)0x0;
> *(uint32_t*)0x20002fd4 = r[1];
> *(uint64_t*)0x20002fd8 = (uint64_t)0x20002934;
> *(uint64_t*)0x20002fe0 = (uint64_t)0x5e;
> *(uint64_t*)0x20002fe8 = (uint64_t)0xfff7;
> *(uint64_t*)0x20002ff0 = (uint64_t)0x20002000;
> *(uint32_t*)0x20002ff8 = (uint32_t)0x0;
> *(uint32_t*)0x20002ffc = r[1];
> *(uint64_t*)0x20002000 = (uint64_t)0x20003000;
> *(uint32_t*)0x20002008 = (uint32_t)0x5;
> *(uint32_t*)0x2000200c = (uint32_t)0x2;
> *(uint64_t*)0x20002010 = (uint64_t)0x1;
> *(uint64_t*)0x20002018 = (uint64_t)0x7;
> *(uint64_t*)0x20002020 = (uint64_t)0x2;
> *(uint64_t*)0x20002028 = (uint64_t)0x4;
> *(uint64_t*)0x20002030 = (uint64_t)0x0;
> *(uint64_t*)0x20002038 = (uint64_t)0x1;
> *(uint64_t*)0x20002040 = (uint64_t)0x4;
> *(uint64_t*)0x20002048 = (uint64_t)0x9;
> *(uint64_t*)0x20002fd8 = (uint64_t)0x5;
> *(uint32_t*)0x20002fe0 = (uint32_t)0x0;
> *(uint32_t*)0x20002fe4 = (uint32_t)0x8;
> *(uint16_t*)0x20002fe8 = (uint16_t)0x7;
> *(uint16_t*)0x20002fea = (uint16_t)0x;
> *(uint32_t*)0x20002fec = (uint32_t)0x;
> *(uint64_t*)0x20002ff0 = (uint64_t)0x20005fe3;
> *(uint64_t*)0x20002ff8 = (uint64_t)0x2e;
> *(uint64_t*)0x20003000 = (uint64_t)0x8;
> *(uint64_t*)0x20003008 = (uint64_t)0x20002a50;
> *(uint32_t*)0x20003010 = (uint32_t)0x1;
> *(uint32_t*)0x20003014 = r[1];
> *(uint64_t*)0x20002a50 = (uint64_t)0x20003000;
> *(uint32_t*)0x20002a58 = (uint32_t)0xb;
> *(uint32_t*)0x20002a5c = (uint32_t)0x1;
> *(uint64_t*)0x20002a60 = (uint64_t)0x5;
> *(uint64_t*)0x20002a68 = (uint64_t)0xacf;
> *(uint64_t*)0x20002a70 = (uint64_t)0x8a;
> *(uint64_t*)0x20002a78 = (uint64_t)0x3;
> *(uint64_t*)0x20002a80 = (uint64_t)0x8d;
> *(uint64_t*)0x20002a88 = (uint64_t)0xf5a;
> *(uint64_t*)0x20002a90 = (uint64_t)0xd94;
> *(uint64_t*)0x20002a98 = (uint64_t)0x9;
> *(uint64_t*)0x2000246d = (uint64_t)0x0;
> *(uint32_t*)0x20002475 = (uint32_t)0x0;
> *(uint32_t*)0x20002479 = (uint32_t)0x2;
> *(uint16_t*)0x2000247d = (uint16_t)0x2;
> *(uint16_t*)0x2000247f = (uint16_t)0x0;
> *(uint32_t*)0x20002481 = r[1];
> *(uint64_t*)0x20002485 = (uint64_t)0x20002d52;
> *(uint64_t*)0x2000248d = (uint64_t)0x11;
> *(uint64_t*)0x20002495 = (uint64_t)0x4;
> *(uint64_t*)0x2000249d = (uint64_t)0x20002fb0;
> *(uint32_t*)0x200024a5 = (uint32_t)0x1;
> *(uint32_t*)0x200024a9 = r[1];
> *(uint64_t*)0x20002fb0 = (uint64_t)0x20003000;
> *(uint32_t*)0x20002fb8 = (uint32_t)0x4;
> *(uint32_t*)0x20002fbc = (uint32_t)0x2;
> *(uint64_t*)0x20002fc0 = (uint64_t)0x3;
> *(uint64_t*)0x20002fc8 = (uint64_t)0x6;
> *(uint64_t*)0x20002fd0 = (uint64_t)0xe3;
> *(uint64_t*)0x20002fd8 = (uint64_t)0xee;
> *(uint64_t*)0x20002fe0 = (uint64_t)0x8;
> *(uint64_t*)0x20002fe8 = (uint64_t)0x1;
> *(uint64_t*)0x20002ff0 = (uint64_t)0x4;
> *(uint64_t*)0x20002ff8 = (uint64_t)0x8;
> r[85] = syscall(SYS_io_submit, r[5],

[net-next 01/20] i40e: Add mac_filter_element at the end of the list instead of HEAD

2016-02-04 Thread Jeff Kirsher

From: Kiran Patil 

Add MAC filter element to the end of the list in the given order,
just to be tidy, and just in case there are ever any ordering issues in
the future.

Change-ID: Idc15276147593ea9393ac72c861f9c7905a791b4
Signed-off-by: Kiran Patil 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c 
b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 8f3b53e..d078a63 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -1368,7 +1368,7 @@ struct i40e_mac_filter *i40e_add_filter(struct i40e_vsi 
*vsi,
f->changed = true;
 
INIT_LIST_HEAD(>list);
-   list_add(>list, >mac_filter_list);
+   list_add_tail(>list, >mac_filter_list);
}
 
/* increment counter and add a new flag if needed */
-- 
2.5.0

[PATCH net-next] MAINTAINERS: Update tg3 maintainer

2016-02-04 Thread skallam

From: Siva Reddy Kallam 

Signed-off-by: Siva Reddy Kallam 
Signed-off-by: Michael Chan 
Acked-by: Prashant Sreedharan 
---
 MAINTAINERS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index f678c37..5edcb8a 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2424,6 +2424,7 @@ F:include/linux/bcm963xx_nvram.h
 F: include/linux/bcm963xx_tag.h
 
 BROADCOM TG3 GIGABIT ETHERNET DRIVER
+M: Siva Reddy Kallam 
 M: Prashant Sreedharan 
 M: Michael Chan 
 L: netdev@vger.kernel.org
-- 
1.9.1

[PATCH] bpf_dbg: do not initialise statics to 0

2016-02-04 Thread Wei Tang

This patch fixes the checkpatch.pl error to bpf_dbg.c:

ERROR: do not initialise statics to 0

Signed-off-by: Wei Tang 
---
 tools/net/bpf_dbg.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/tools/net/bpf_dbg.c b/tools/net/bpf_dbg.c
index 9a287be..4f254bc 100644
--- a/tools/net/bpf_dbg.c
+++ b/tools/net/bpf_dbg.c
@@ -129,16 +129,16 @@ struct bpf_regs {
 };
 
 static struct sock_filter bpf_image[BPF_MAXINSNS + 1];
-static unsigned int bpf_prog_len = 0;
+static unsigned int bpf_prog_len;
 
 static int bpf_breakpoints[64];
 static struct bpf_regs bpf_regs[BPF_MAXINSNS + 1];
 static struct bpf_regs bpf_curr;
-static unsigned int bpf_regs_len = 0;
+static unsigned int bpf_regs_len;
 
 static int pcap_fd = -1;
-static unsigned int pcap_packet = 0;
-static size_t pcap_map_size = 0;
+static unsigned int pcap_packet;
+static size_t pcap_map_size;
 static char *pcap_ptr_va_start, *pcap_ptr_va_curr;
 
 static const char * const op_table[] = {
@@ -1172,7 +1172,7 @@ static int cmd_breakpoint(char *subcmd)
 
 static int cmd_run(char *num)
 {
-   static uint32_t pass = 0, fail = 0;
+   static uint32_t pass, fail;
bool has_limit = true;
int pkts = 0, i = 0;
 
-- 
1.9.1

RE: [PATCH 1/2] rtlwifi: Fix improve function 'rtl_addr_delay()' in core.c

2016-02-04 Thread David Laight

From: Larry Finger
> Sent: 03 February 2016 19:45
...
> The performance will depend on where you satisfy the condition. All switch 
> cases
> have the same execution time, but in the if .. else if .. else form, the 
> earlier
> tests execute more quickly. I'm not sure that one can make any blanket 
> statement
> about performance. Certainly, the switch version will be larger. For a switch
> with 8 cases plus default, the object code if 43 bytes larger than the nested
> ifs in a test program that I created. That is a significant penalty.

There is also the penalty of the (likely) data cache miss reading the jump 
table.
But given this code is all about generating a variable delay the execution
speed is probably irrelevant.

It would be much more interesting if the delay could be changed for sleeps.

David

RE: [PATCH v3 net-next] net: Implement fast csum_partial for x86_64

2016-02-04 Thread David Laight

From: Tom Herbert
> Sent: 03 February 2016 19:19
...
> + /* Main loop */
> +50:  adcq0*8(%rdi),%rax
> + adcq1*8(%rdi),%rax
> + adcq2*8(%rdi),%rax
> + adcq3*8(%rdi),%rax
> + adcq4*8(%rdi),%rax
> + adcq5*8(%rdi),%rax
> + adcq6*8(%rdi),%rax
> + adcq7*8(%rdi),%rax
> + adcq8*8(%rdi),%rax
> + adcq9*8(%rdi),%rax
> + adcq10*8(%rdi),%rax
> + adcq11*8(%rdi),%rax
> + adcq12*8(%rdi),%rax
> + adcq13*8(%rdi),%rax
> + adcq14*8(%rdi),%rax
> + adcq15*8(%rdi),%rax
> + lea 128(%rdi), %rdi
> + loop50b

I'd need convincing that unrolling the loop like that gives any significant 
gain.
You have a dependency chain on the carry flag so have delays between the 'adcq'
instructions (these may be more significant than the memory reads from l1 
cache).

I also don't remember (might be wrong) the 'loop' instruction being executed 
quickly.
If 'loop' is fast then you will probably find that:

10: adcq 0(%rdi),%rax
lea  8(%rdi),%rdi
loop 10b

is just as fast since the three instructions could all be executed in parallel.
But I suspect that 'dec %cx; jnz 10b' is actually better (and might execute as
a single micro-op).
IIRC 'adc' and 'dec' will both have dependencies on the flags register
so cannot execute together (which is a shame here).

It is also possible that breaking the carry-chain dependency by doing 32bit
adds (possibly after 64bit reads) can be made to be faster.

David

[net-next 02/20] i40e/i40evf: Fix RSS rx-flow-hash configuration through ethtool

2016-02-04 Thread Jeff Kirsher

From: Anjali Singhai Jain 

This patch fixes the Hash PCTYPE enable for X722 since it supports
a broader selection of PCTYPES for TCP and UDP.

This patch also fixes a bug in XL710, X710, X722 support for RSS,
as of now we cannot reduce the (4)tuple for RSS for TCP/IPv4/IPV6 or
UDP/IPv4/IPv6 packets since this requires a product feature change
that comes in a later release.

A VF should never be allowed to change the tuples for RSS for any
PCTYPE since that's a global setting for the device in case of i40e
devices.

Change-ID: I0ee7203c9b24813260f58f3220798bc9d9ac4a12
Signed-off-by: Anjali Singhai Jain 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_ethtool.c | 14 +++-
 drivers/net/ethernet/intel/i40evf/i40evf_ethtool.c | 40 +-
 2 files changed, 12 insertions(+), 42 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c 
b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
index 29d5833..c8b9dca 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
@@ -2166,8 +2166,7 @@ static int i40e_set_rss_hash_opt(struct i40e_pf *pf, 
struct ethtool_rxnfc *nfc)
case TCP_V4_FLOW:
switch (nfc->data & (RXH_L4_B_0_1 | RXH_L4_B_2_3)) {
case 0:
-   hena &= ~BIT_ULL(I40E_FILTER_PCTYPE_NONF_IPV4_TCP);
-   break;
+   return -EINVAL;
case (RXH_L4_B_0_1 | RXH_L4_B_2_3):
hena |= BIT_ULL(I40E_FILTER_PCTYPE_NONF_IPV4_TCP);
break;
@@ -2178,8 +2177,7 @@ static int i40e_set_rss_hash_opt(struct i40e_pf *pf, 
struct ethtool_rxnfc *nfc)
case TCP_V6_FLOW:
switch (nfc->data & (RXH_L4_B_0_1 | RXH_L4_B_2_3)) {
case 0:
-   hena &= ~BIT_ULL(I40E_FILTER_PCTYPE_NONF_IPV6_TCP);
-   break;
+   return -EINVAL;
case (RXH_L4_B_0_1 | RXH_L4_B_2_3):
hena |= BIT_ULL(I40E_FILTER_PCTYPE_NONF_IPV6_TCP);
break;
@@ -2190,9 +2188,7 @@ static int i40e_set_rss_hash_opt(struct i40e_pf *pf, 
struct ethtool_rxnfc *nfc)
case UDP_V4_FLOW:
switch (nfc->data & (RXH_L4_B_0_1 | RXH_L4_B_2_3)) {
case 0:
-   hena &= ~(BIT_ULL(I40E_FILTER_PCTYPE_NONF_IPV4_UDP) |
- BIT_ULL(I40E_FILTER_PCTYPE_FRAG_IPV4));
-   break;
+   return -EINVAL;
case (RXH_L4_B_0_1 | RXH_L4_B_2_3):
hena |= (BIT_ULL(I40E_FILTER_PCTYPE_NONF_IPV4_UDP) |
 BIT_ULL(I40E_FILTER_PCTYPE_FRAG_IPV4));
@@ -2204,9 +2200,7 @@ static int i40e_set_rss_hash_opt(struct i40e_pf *pf, 
struct ethtool_rxnfc *nfc)
case UDP_V6_FLOW:
switch (nfc->data & (RXH_L4_B_0_1 | RXH_L4_B_2_3)) {
case 0:
-   hena &= ~(BIT_ULL(I40E_FILTER_PCTYPE_NONF_IPV6_UDP) |
- BIT_ULL(I40E_FILTER_PCTYPE_FRAG_IPV6));
-   break;
+   return -EINVAL;
case (RXH_L4_B_0_1 | RXH_L4_B_2_3):
hena |= (BIT_ULL(I40E_FILTER_PCTYPE_NONF_IPV6_UDP) |
 BIT_ULL(I40E_FILTER_PCTYPE_FRAG_IPV6));
diff --git a/drivers/net/ethernet/intel/i40evf/i40evf_ethtool.c 
b/drivers/net/ethernet/intel/i40evf/i40evf_ethtool.c
index a4c9feb..8906785 100644
--- a/drivers/net/ethernet/intel/i40evf/i40evf_ethtool.c
+++ b/drivers/net/ethernet/intel/i40evf/i40evf_ethtool.c
@@ -477,54 +477,30 @@ static int i40evf_set_rss_hash_opt(struct i40evf_adapter 
*adapter,
 
switch (nfc->flow_type) {
case TCP_V4_FLOW:
-   switch (nfc->data & (RXH_L4_B_0_1 | RXH_L4_B_2_3)) {
-   case 0:
-   hena &= ~BIT_ULL(I40E_FILTER_PCTYPE_NONF_IPV4_TCP);
-   break;
-   case (RXH_L4_B_0_1 | RXH_L4_B_2_3):
+   if (nfc->data & (RXH_L4_B_0_1 | RXH_L4_B_2_3))
hena |= BIT_ULL(I40E_FILTER_PCTYPE_NONF_IPV4_TCP);
-   break;
-   default:
+   else
return -EINVAL;
-   }
break;
case TCP_V6_FLOW:
-   switch (nfc->data & (RXH_L4_B_0_1 | RXH_L4_B_2_3)) {
-   case 0:
-   hena &= ~BIT_ULL(I40E_FILTER_PCTYPE_NONF_IPV6_TCP);
-   break;
-   case (RXH_L4_B_0_1 | RXH_L4_B_2_3):
+   if (nfc->data & (RXH_L4_B_0_1 | RXH_L4_B_2_3))
hena |= BIT_ULL(I40E_FILTER_PCTYPE_NONF_IPV6_TCP);
-   break;
-

[net-next 14/20] i40e/i40evf: Use private workqueue

2016-02-04 Thread Jeff Kirsher

From: Jesse Brandeburg 

As done per ixgbe, use a private workqueue to avoid blocking the
system workqueue.  This avoids some strange side effects when
some other entity is depending on the system work queue.

Change-ID: Ic8ba08f5b03696cf638b21afd25fbae7738d55ee
Signed-off-by: Jesse Brandeburg 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_main.c | 15 ++-
 drivers/net/ethernet/intel/i40evf/i40evf_main.c | 10 +-
 2 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c 
b/drivers/net/ethernet/intel/i40e/i40e_main.c
index bd81a97..3e482bc 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -112,6 +112,8 @@ MODULE_DESCRIPTION("Intel(R) Ethernet Connection XL710 
Network Driver");
 MODULE_LICENSE("GPL");
 MODULE_VERSION(DRV_VERSION);
 
+static struct workqueue_struct *i40e_wq;
+
 /**
  * i40e_allocate_dma_mem_d - OS specific memory alloc for shared code
  * @hw:   pointer to the HW structure
@@ -297,7 +299,7 @@ static void i40e_service_event_schedule(struct i40e_pf *pf)
if (!test_bit(__I40E_DOWN, >state) &&
!test_bit(__I40E_RESET_RECOVERY_PENDING, >state) &&
!test_and_set_bit(__I40E_SERVICE_SCHED, >state))
-   schedule_work(>service_task);
+   queue_work(i40e_wq, >service_task);
 }
 
 /**
@@ -11470,6 +11472,16 @@ static int __init i40e_init_module(void)
i40e_driver_string, i40e_driver_version_str);
pr_info("%s: %s\n", i40e_driver_name, i40e_copyright);
 
+   /* we will see if single thread per module is enough for now,
+* it can't be any worse than using the system workqueue which
+* was already single threaded
+*/
+   i40e_wq = create_singlethread_workqueue(i40e_driver_name);
+   if (!i40e_wq) {
+   pr_err("%s: Failed to create workqueue\n", i40e_driver_name);
+   return -ENOMEM;
+   }
+
i40e_dbg_init();
return pci_register_driver(_driver);
 }
@@ -11484,6 +11496,7 @@ module_init(i40e_init_module);
 static void __exit i40e_exit_module(void)
 {
pci_unregister_driver(_driver);
+   destroy_workqueue(i40e_wq);
i40e_dbg_exit();
 }
 module_exit(i40e_exit_module);
diff --git a/drivers/net/ethernet/intel/i40evf/i40evf_main.c 
b/drivers/net/ethernet/intel/i40evf/i40evf_main.c
index 615ad0f..66964eb 100644
--- a/drivers/net/ethernet/intel/i40evf/i40evf_main.c
+++ b/drivers/net/ethernet/intel/i40evf/i40evf_main.c
@@ -69,6 +69,8 @@ MODULE_DESCRIPTION("Intel(R) XL710 X710 Virtual Function 
Network Driver");
 MODULE_LICENSE("GPL");
 MODULE_VERSION(DRV_VERSION);
 
+static struct workqueue_struct *i40evf_wq;
+
 /**
  * i40evf_allocate_dma_mem_d - OS specific memory alloc for shared code
  * @hw:   pointer to the HW structure
@@ -182,7 +184,7 @@ static void i40evf_tx_timeout(struct net_device *netdev)
if (!(adapter->flags & (I40EVF_FLAG_RESET_PENDING |
I40EVF_FLAG_RESET_NEEDED))) {
adapter->flags |= I40EVF_FLAG_RESET_NEEDED;
-   schedule_work(>reset_task);
+   queue_work(i40evf_wq, >reset_task);
}
 }
 
@@ -2895,6 +2897,11 @@ static int __init i40evf_init_module(void)
 
pr_info("%s\n", i40evf_copyright);
 
+   i40evf_wq = create_singlethread_workqueue(i40evf_driver_name);
+   if (!i40evf_wq) {
+   pr_err("%s: Failed to create workqueue\n", i40evf_driver_name);
+   return -ENOMEM;
+   }
ret = pci_register_driver(_driver);
return ret;
 }
@@ -2910,6 +2917,7 @@ module_init(i40evf_init_module);
 static void __exit i40evf_exit_module(void)
 {
pci_unregister_driver(_driver);
+   destroy_workqueue(i40evf_wq);
 }
 
 module_exit(i40evf_exit_module);
-- 
2.5.0

[net-next 19/20] i40e: AQ Add external power class to get link status

2016-02-04 Thread Jeff Kirsher

From: Shannon Nelson 

Add the new External Device Power Ability field to the get_link_status data
structure, using space from the reserved field at the end of the struct.

Signed-off-by: Shannon Nelson 
Acked-by: Kevin Scott 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h   | 7 ++-
 drivers/net/ethernet/intel/i40evf/i40e_adminq_cmd.h | 7 ++-
 2 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h 
b/drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h
index eab55ea..0e608d2 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h
@@ -1758,7 +1758,12 @@ struct i40e_aqc_get_link_status {
u8  config;
 #define I40E_AQ_CONFIG_CRC_ENA 0x04
 #define I40E_AQ_CONFIG_PACING_MASK 0x78
-   u8  reserved[5];
+   u8  external_power_ability;
+#define I40E_AQ_LINK_POWER_CLASS_1 0x00
+#define I40E_AQ_LINK_POWER_CLASS_2 0x01
+#define I40E_AQ_LINK_POWER_CLASS_3 0x02
+#define I40E_AQ_LINK_POWER_CLASS_4 0x03
+   u8  reserved[4];
 };
 
 I40E_CHECK_CMD_LENGTH(i40e_aqc_get_link_status);
diff --git a/drivers/net/ethernet/intel/i40evf/i40e_adminq_cmd.h 
b/drivers/net/ethernet/intel/i40evf/i40e_adminq_cmd.h
index 30b5a33..578b178 100644
--- a/drivers/net/ethernet/intel/i40evf/i40e_adminq_cmd.h
+++ b/drivers/net/ethernet/intel/i40evf/i40e_adminq_cmd.h
@@ -1755,7 +1755,12 @@ struct i40e_aqc_get_link_status {
u8  config;
 #define I40E_AQ_CONFIG_CRC_ENA 0x04
 #define I40E_AQ_CONFIG_PACING_MASK 0x78
-   u8  reserved[5];
+   u8  external_power_ability;
+#define I40E_AQ_LINK_POWER_CLASS_1 0x00
+#define I40E_AQ_LINK_POWER_CLASS_2 0x01
+#define I40E_AQ_LINK_POWER_CLASS_3 0x02
+#define I40E_AQ_LINK_POWER_CLASS_4 0x03
+   u8  reserved[4];
 };
 
 I40E_CHECK_CMD_LENGTH(i40e_aqc_get_link_status);
-- 
2.5.0

[net-next 09/20] i40e: bump version to 1.4.10

2016-02-04 Thread Jeff Kirsher

From: Catherine Sullivan 

Bump.

Change-ID: Ic9a495feb9ab0606f953c3848b0acf67169d3930
Signed-off-by: Catherine Sullivan 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c 
b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 486ae16..c88583e 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -51,7 +51,7 @@ static const char i40e_driver_string[] =
 
 #define DRV_VERSION_MAJOR 1
 #define DRV_VERSION_MINOR 4
-#define DRV_VERSION_BUILD 8
+#define DRV_VERSION_BUILD 10
 #define DRV_VERSION __stringify(DRV_VERSION_MAJOR) "." \
 __stringify(DRV_VERSION_MINOR) "." \
 __stringify(DRV_VERSION_BUILD)DRV_KERN
-- 
2.5.0

[net-next 00/20][pull request] 40GbE Intel Wired LAN Driver Updates 2016-02-03

2016-02-04 Thread Jeff Kirsher

This series contains updates to i40e and i40evf only.

Kiran adds the MAC filter element to the end of the list instead of HEAD
just in case there are ever any ordering issues in the future.

Anjali fixes several RSS issues, first fixes the hash PCTYPE enable for
X722 since it supports a broader selection of PCTYPES for TCP and UDP.
Then fixes a bug in XL710, X710, and X722 support for RSS since we cannot
reduce the 4-tuple for RSS for TCP/IPv4/IPv6 or UDP/IPv4/IPv6 packets
since this requires a product feature change coming in a later release.
Cleans up the reset code where the restart-autoneg workaround is
applied, since X722 does not need the workaround, add a flag to indicate
which MAC and firmware version require the workaround to be applied.
Adds new device id's for X722 and code to add their support.  Also
adds another way to access the RSS keys and lookup table using the admin
queue for X722 devices.

Catherine updates the driver to replace the MAC check with a feature
flag check for 100M SGMII, since it is only support on X722 devices
currently.

Mitch reworks the VF driver to allow channel bonding, which was not
possible before this patch due to the asynchronous nature of the admin
queue mechanism.  Also fixes a rare case which causes a panic if the
VF driver is removed during reset recovery, resolve this by setting the
ring pointers to NULL after freeing them.

Shannon cleans up the driver where device capabilities were defined in
two different places, and neither had all the definitions, so he
consolidates the definitions in the admin queue API.  Also adds the new
proxy-wake-on-lan capability bit available with the new X722 device.
Lastly, added the new External Device Power Ability field to the
get_link_status data structure by using a reserved field at the end
of the structure.

Jesse mimics the ixgbe driver's use of a private work queue in the i40e
and i40evf drivers to avoid blocking the system work queue.

Greg cleans up the driver to limit the firmware revision checks to
properly handle DCB configurations from the firmware to the older
devices which need these checks (specifically X710 and XL710 devices
only).

The following are changes since commit b45efa30a626e915192a6c548cd8642379cd47cc:
  Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
and are available in the git repository at:
  git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue 40GbE

Anjali Singhai Jain (6):
  i40e/i40evf: Fix RSS rx-flow-hash configuration through ethtool
  i40e: Cleanup the code with respect to restarting autoneg
  i40e: add new device IDs for X722
  i40e: Extend ethtool RSS hooks for X722
  i40e/i40evf: Fix for UDP/TCP RSS for X722
  i40evf: add new write-back mode

Catherine Sullivan (3):
  i40e: Replace X722 mac check in ethtool get_settings
  i40e: bump version to 1.4.10
  i40e: add 100Mb ethtool reporting

Greg Bowers (1):
  i40e: Limit DCB FW version checks to X710/XL710 devices

Jesse Brandeburg (2):
  i40e: update features with right offload
  i40e/i40evf: Use private workqueue

Kiran Patil (1):
  i40e: Add mac_filter_element at the end of the list instead of HEAD

Mitch Williams (2):
  i40evf: allow channel bonding of VFs
  i40evf: null out ring pointers on free

Shannon Nelson (5):
  i40e: define function capabilities in only one place
  i40e: add new proxy-wol bit for X722
  i40e: AQ Add Run PHY Activity struct
  i40e: AQ Geneve cloud tunnel type
  i40e: AQ Add external power class to get link status

 drivers/net/ethernet/intel/i40e/i40e.h |  2 +
 drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h  | 26 +-
 drivers/net/ethernet/intel/i40e/i40e_common.c  | 87 +++
 drivers/net/ethernet/intel/i40e/i40e_dcb.c | 12 +--
 drivers/net/ethernet/intel/i40e/i40e_devids.h  |  2 +
 drivers/net/ethernet/intel/i40e/i40e_ethtool.c | 38 ++---
 drivers/net/ethernet/intel/i40e/i40e_main.c| 99 +++---
 drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c | 12 +++
 .../net/ethernet/intel/i40evf/i40e_adminq_cmd.h| 26 +-
 drivers/net/ethernet/intel/i40evf/i40e_txrx.c  | 16 
 drivers/net/ethernet/intel/i40evf/i40evf.h |  1 +
 drivers/net/ethernet/intel/i40evf/i40evf_ethtool.c | 57 ++---
 drivers/net/ethernet/intel/i40evf/i40evf_main.c| 34 ++--
 .../net/ethernet/intel/i40evf/i40evf_virtchnl.c|  2 +
 14 files changed, 283 insertions(+), 131 deletions(-)

-- 
2.5.0

[net-next 03/20] i40e: Replace X722 mac check in ethtool get_settings

2016-02-04 Thread Jeff Kirsher

From: Catherine Sullivan 

100M SGMII is only supported on X722.  Replace the mac check with
a feature flag check that is only set for the X722 device.

Change-ID: I53452d9af6af8cd9dca8500215fbc6ce93418f52
Signed-off-by: Catherine Sullivan 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e.h | 1 +
 drivers/net/ethernet/intel/i40e/i40e_ethtool.c | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e.h 
b/drivers/net/ethernet/intel/i40e/i40e.h
index 68f2204..47f6c0a 100644
--- a/drivers/net/ethernet/intel/i40e/i40e.h
+++ b/drivers/net/ethernet/intel/i40e/i40e.h
@@ -339,6 +339,7 @@ struct i40e_pf {
 #define I40E_FLAG_VEB_MODE_ENABLED BIT_ULL(40)
 #define I40E_FLAG_GENEVE_OFFLOAD_CAPABLE   BIT_ULL(41)
 #define I40E_FLAG_NO_PCI_LINK_CHECKBIT_ULL(42)
+#define I40E_FLAG_100M_SGMII_CAPABLE   BIT_ULL(43)
 #define I40E_FLAG_PF_MAC   BIT_ULL(50)
 
/* tracks features that get auto disabled by errors */
diff --git a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c 
b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
index c8b9dca..252a9dd 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
@@ -340,7 +340,7 @@ static void i40e_get_settings_link_up(struct i40e_hw *hw,
  SUPPORTED_1000baseT_Full;
if (hw_link_info->requested_speeds & I40E_LINK_SPEED_1GB)
ecmd->advertising |= ADVERTISED_1000baseT_Full;
-   if (pf->hw.mac.type == I40E_MAC_X722) {
+   if (pf->flags & I40E_FLAG_100M_SGMII_CAPABLE) {
ecmd->supported |= SUPPORTED_100baseT_Full;
if (hw_link_info->requested_speeds &
I40E_LINK_SPEED_100MB)
-- 
2.5.0

[net-next 16/20] i40e: Limit DCB FW version checks to X710/XL710 devices

2016-02-04 Thread Jeff Kirsher

From: Greg Bowers 

X710/XL710 devices require FW version checks to properly handle DCB
configurations from the FW.  Newer devices do not, so limit these checks
to X710/XL710.

Signed-off-by: Greg Bowers 
Acked-by: Shannon Nelson 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_dcb.c | 12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_dcb.c 
b/drivers/net/ethernet/intel/i40e/i40e_dcb.c
index 2691277..582daa7 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_dcb.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_dcb.c
@@ -814,13 +814,15 @@ i40e_status i40e_get_dcb_config(struct i40e_hw *hw)
struct i40e_aqc_get_cee_dcb_cfg_resp cee_cfg;
struct i40e_aqc_get_cee_dcb_cfg_v1_resp cee_v1_cfg;
 
-   /* If Firmware version < v4.33 IEEE only */
-   if (((hw->aq.fw_maj_ver == 4) && (hw->aq.fw_min_ver < 33)) ||
-   (hw->aq.fw_maj_ver < 4))
+   /* If Firmware version < v4.33 on X710/XL710, IEEE only */
+   if ((hw->mac.type == I40E_MAC_XL710) &&
+   (((hw->aq.fw_maj_ver == 4) && (hw->aq.fw_min_ver < 33)) ||
+ (hw->aq.fw_maj_ver < 4)))
return i40e_get_ieee_dcb_config(hw);
 
-   /* If Firmware version == v4.33 use old CEE struct */
-   if ((hw->aq.fw_maj_ver == 4) && (hw->aq.fw_min_ver == 33)) {
+   /* If Firmware version == v4.33 on X710/XL710, use old CEE struct */
+   if ((hw->mac.type == I40E_MAC_XL710) &&
+   ((hw->aq.fw_maj_ver == 4) && (hw->aq.fw_min_ver == 33))) {
ret = i40e_aq_get_cee_dcb_config(hw, _v1_cfg,
 sizeof(cee_v1_cfg), NULL);
if (!ret) {
-- 
2.5.0

[net-next 18/20] i40e: AQ Geneve cloud tunnel type

2016-02-04 Thread Jeff Kirsher

From: Shannon Nelson 

Fix the name of the new cloud tunnel type from the place-holder NGE
name to the official Geneve.  Also fix the spelling of the VXLAN type.

Signed-off-by: Shannon Nelson 
Acked-by: Kevin Scott 
Signed-off-by: Jeff Kirsher 
Tested-by: Andrew Bowers 
---
 drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h   | 4 ++--
 drivers/net/ethernet/intel/i40evf/i40e_adminq_cmd.h | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h 
b/drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h
index 9e340ca..eab55ea 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h
@@ -1260,9 +1260,9 @@ struct i40e_aqc_add_remove_cloud_filters_element_data {
 
 #define I40E_AQC_ADD_CLOUD_TNL_TYPE_SHIFT  9
 #define I40E_AQC_ADD_CLOUD_TNL_TYPE_MASK   0x1E00
-#define I40E_AQC_ADD_CLOUD_TNL_TYPE_XVLAN  0
+#define I40E_AQC_ADD_CLOUD_TNL_TYPE_VXLAN  0
 #define I40E_AQC_ADD_CLOUD_TNL_TYPE_NVGRE_OMAC 1
-#define I40E_AQC_ADD_CLOUD_TNL_TYPE_NGE2
+#define I40E_AQC_ADD_CLOUD_TNL_TYPE_GENEVE 2
 #define I40E_AQC_ADD_CLOUD_TNL_TYPE_IP 3
 
__le32  tenant_id;
diff --git a/drivers/net/ethernet/intel/i40evf/i40e_adminq_cmd.h 
b/drivers/net/ethernet/intel/i40evf/i40e_adminq_cmd.h
index 51d83c6..30b5a33 100644
--- a/drivers/net/ethernet/intel/i40evf/i40e_adminq_cmd.h
+++ b/drivers/net/ethernet/intel/i40evf/i40e_adminq_cmd.h
@@ -1257,9 +1257,9 @@ struct i40e_aqc_add_remove_cloud_filters_element_data {
 
 #define I40E_AQC_ADD_CLOUD_TNL_TYPE_SHIFT  9
 #define I40E_AQC_ADD_CLOUD_TNL_TYPE_MASK   0x1E00
-#define I40E_AQC_ADD_CLOUD_TNL_TYPE_XVLAN  0
+#define I40E_AQC_ADD_CLOUD_TNL_TYPE_VXLAN  0
 #define I40E_AQC_ADD_CLOUD_TNL_TYPE_NVGRE_OMAC 1
-#define I40E_AQC_ADD_CLOUD_TNL_TYPE_NGE2
+#define I40E_AQC_ADD_CLOUD_TNL_TYPE_GENEVE 2
 #define I40E_AQC_ADD_CLOUD_TNL_TYPE_IP 3
 
__le32  tenant_id;
-- 
2.5.0

[net-next 05/20] i40e: define function capabilities in only one place

2016-02-04 Thread Jeff Kirsher

From: Shannon Nelson 

The device capabilities were defined in two places, and neither had all
the definitions.  It really belongs with the AQ API definition, so this
patch removes the other set of definitions and fills out the missing item.

Change-ID: I273ba7d79a476cd11d2e0ca5825fec1716740de2
Signed-off-by: Shannon Nelson 
Acked-by: Jesse Brandeburg 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h  |  1 +
 drivers/net/ethernet/intel/i40e/i40e_common.c  | 85 +++---
 .../net/ethernet/intel/i40evf/i40e_adminq_cmd.h|  1 +
 3 files changed, 30 insertions(+), 57 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h 
b/drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h
index b22012a..256ce65 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h
@@ -422,6 +422,7 @@ struct i40e_aqc_list_capabilities_element_resp {
 #define I40E_AQ_CAP_ID_LED 0x0061
 #define I40E_AQ_CAP_ID_SDP 0x0062
 #define I40E_AQ_CAP_ID_MDIO0x0063
+#define I40E_AQ_CAP_ID_WSR_PROT0x0064
 #define I40E_AQ_CAP_ID_FLEX10  0x00F1
 #define I40E_AQ_CAP_ID_CEM 0x00F2
 
diff --git a/drivers/net/ethernet/intel/i40e/i40e_common.c 
b/drivers/net/ethernet/intel/i40e/i40e_common.c
index 6a034dd..4bdb08b 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_common.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_common.c
@@ -2765,35 +2765,6 @@ i40e_aq_erase_nvm_exit:
return status;
 }
 
-#define I40E_DEV_FUNC_CAP_SWITCH_MODE  0x01
-#define I40E_DEV_FUNC_CAP_MGMT_MODE0x02
-#define I40E_DEV_FUNC_CAP_NPAR 0x03
-#define I40E_DEV_FUNC_CAP_OS2BMC   0x04
-#define I40E_DEV_FUNC_CAP_VALID_FUNC   0x05
-#define I40E_DEV_FUNC_CAP_SRIOV_1_10x12
-#define I40E_DEV_FUNC_CAP_VF   0x13
-#define I40E_DEV_FUNC_CAP_VMDQ 0x14
-#define I40E_DEV_FUNC_CAP_802_1_QBG0x15
-#define I40E_DEV_FUNC_CAP_802_1_QBH0x16
-#define I40E_DEV_FUNC_CAP_VSI  0x17
-#define I40E_DEV_FUNC_CAP_DCB  0x18
-#define I40E_DEV_FUNC_CAP_FCOE 0x21
-#define I40E_DEV_FUNC_CAP_ISCSI0x22
-#define I40E_DEV_FUNC_CAP_RSS  0x40
-#define I40E_DEV_FUNC_CAP_RX_QUEUES0x41
-#define I40E_DEV_FUNC_CAP_TX_QUEUES0x42
-#define I40E_DEV_FUNC_CAP_MSIX 0x43
-#define I40E_DEV_FUNC_CAP_MSIX_VF  0x44
-#define I40E_DEV_FUNC_CAP_FLOW_DIRECTOR0x45
-#define I40E_DEV_FUNC_CAP_IEEE_15880x46
-#define I40E_DEV_FUNC_CAP_FLEX10   0xF1
-#define I40E_DEV_FUNC_CAP_CEM  0xF2
-#define I40E_DEV_FUNC_CAP_IWARP0x51
-#define I40E_DEV_FUNC_CAP_LED  0x61
-#define I40E_DEV_FUNC_CAP_SDP  0x62
-#define I40E_DEV_FUNC_CAP_MDIO 0x63
-#define I40E_DEV_FUNC_CAP_WR_CSR_PROT  0x64
-
 /**
  * i40e_parse_discover_capabilities
  * @hw: pointer to the hw struct
@@ -2832,79 +2803,79 @@ static void i40e_parse_discover_capabilities(struct 
i40e_hw *hw, void *buff,
major_rev = cap->major_rev;
 
switch (id) {
-   case I40E_DEV_FUNC_CAP_SWITCH_MODE:
+   case I40E_AQ_CAP_ID_SWITCH_MODE:
p->switch_mode = number;
break;
-   case I40E_DEV_FUNC_CAP_MGMT_MODE:
+   case I40E_AQ_CAP_ID_MNG_MODE:
p->management_mode = number;
break;
-   case I40E_DEV_FUNC_CAP_NPAR:
+   case I40E_AQ_CAP_ID_NPAR_ACTIVE:
p->npar_enable = number;
break;
-   case I40E_DEV_FUNC_CAP_OS2BMC:
+   case I40E_AQ_CAP_ID_OS2BMC_CAP:
p->os2bmc = number;
break;
-   case I40E_DEV_FUNC_CAP_VALID_FUNC:
+   case I40E_AQ_CAP_ID_FUNCTIONS_VALID:
p->valid_functions = number;
break;
-   case I40E_DEV_FUNC_CAP_SRIOV_1_1:
+   case I40E_AQ_CAP_ID_SRIOV:
if (number == 1)
p->sr_iov_1_1 = true;
break;
-   case I40E_DEV_FUNC_CAP_VF:
+   case I40E_AQ_CAP_ID_VF:
p->num_vfs = number;
p->vf_base_id = logical_id;
break;
-   case I40E_DEV_FUNC_CAP_VMDQ:
+   case I40E_AQ_CAP_ID_VMDQ:
if (number == 1)
p->vmdq = true;
break;
-   case I40E_DEV_FUNC_CAP_802_1_QBG:
+   case I40E_AQ_CAP_ID_8021QBG:
if (number == 1)
p->evb_802_1_qbg = true;

Re: [PATCH net-next v5 2/2] virtio_net: add ethtool support for set and get of settings

2016-02-04 Thread Nikolay Aleksandrov

On 02/04/2016 01:21 PM, Michael S. Tsirkin wrote:
> On Wed, Feb 03, 2016 at 04:04:37AM +0100, Nikolay Aleksandrov wrote:
>> From: Nikolay Aleksandrov 
[snip]
>>  struct padded_vnet_hdr {
>> @@ -1376,6 +1380,58 @@ static void virtnet_get_channels(struct net_device 
>> *dev,
>>  channels->other_count = 0;
>>  }
>>  
>> +/* Check if the user is trying to change anything besides speed/duplex */
>> +static bool virtnet_validate_ethtool_cmd(const struct ethtool_cmd *cmd)
>> +{
>> +struct ethtool_cmd diff1 = *cmd;
>> +struct ethtool_cmd diff2 = {};
>> +
>> +/* advertising and cmd are usually set, ignore port because we set it */
> 
> We set it where?
If you're asking about advertising - ethtool sets it automatically when the
user tries to set both speed and duplex together.

> Instead of this, should not we set diff2.port to PORT_OTHER?
> 
Yes, that will validate it too.

>> +ethtool_cmd_speed_set(, 0);
>> +diff1.advertising = 0;
>> +diff1.duplex = 0;
>> +diff1.port = 0;
>> +diff1.cmd = 0;
>> +
>> +return !memcmp(, , sizeof(diff1));
>> +}
>> +
[snip]

net: memory leak in ip_cmsg_send

2016-02-04 Thread Dmitry Vyukov

Hello,

I've hit the following memory leak while running syzkaller fuzzer:

unreferenced object 0x88002ea39708 (size 64):
  comm "syz-executor", pid 19887, jiffies 4295848369 (age 8.676s)
  hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
00 00 00 00 00 00 00 00 04 00 00 00 04 00 00 00  
  backtrace:
[< inline >] kzalloc include/linux/slab.h:607
[< inline >] ip_options_get_alloc net/ipv4/ip_options.c:515
[] ip_options_get+0x34/0x90 net/ipv4/ip_options.c:551
[] ip_cmsg_send+0x5bd/0x6f0 net/ipv4/ip_sockglue.c:252
[] raw_sendmsg+0xa52/0x25a0 net/ieee802154/socket.c:738
[] inet_sendmsg+0x2f7/0x4c0 net/ipv4/af_inet.c:736
[< inline >] sock_sendmsg_nosec net/socket.c:611
[] sock_sendmsg+0xca/0x110 net/socket.c:621
[] ___sys_sendmsg+0x72a/0x840 net/socket.c:1947
[] __sys_sendmsg+0xce/0x170 net/socket.c:1981
[< inline >] SYSC_sendmsg net/socket.c:1992
[] SyS_sendmsg+0x2d/0x50 net/socket.c:1988

ip_cmsg_send seems to forget to free ipc->opt when it returns an error
(all callers expect that there is nothing to cleanup if it returns an
error).

On commit 34229b277480f46c1e9a19f027f30b074512e68b.

Re: gigaset: memory leak in gigaset_initcshw

2016-02-04 Thread Dmitry Vyukov

On Wed, Feb 3, 2016 at 8:11 PM, Paul Bolle  wrote:
> Hi Dmitry,
>
> On wo, 2016-02-03 at 17:16 +0100, Paul Bolle wrote:
>> The above should provide me with enough information to figure out
>> what's going on here.
>
> I've instrumented ser_gigaset with some printk's. Basically I added the
> stuff pasted at the end of this message. In 10.000 runs of the program
> syzkaller generated the added printk's suggest that struct ser_cardstate
> is freed every time.
>
> (Note that this was done on a machine that, probably like the VM
> syzkaller was running in, doesn't have the clunky hardware that this
> driver manages attached.)
>
> Before I dive deeper into this: can you reproduce this leak? Is it
> perhaps a one in gazillion runs thing? Do you have the logs of a run
> that warned about this leak at hand?
>
> Thanks,
>
>
> Paul Bolle
>
> @@ -375,9 +377,12 @@ static void gigaset_device_release(struct device *dev)
>  {
> struct cardstate *cs = dev_get_drvdata(dev);
>
> -   if (!cs)
> +   if (!cs) {
> +   pr_info("%s: no cardstate", __func__);
> return;
> +   }
> dev_set_drvdata(dev, NULL);
> +   pr_info("%s: kfree(%p)", __func__, cs->hw.ser);
> kfree(cs->hw.ser);
> cs->hw.ser = NULL;
>  }
> @@ -392,6 +397,7 @@ static int gigaset_initcshw(struct cardstate *cs)
> struct ser_cardstate *scs;
>
> scs = kzalloc(sizeof(struct ser_cardstate), GFP_KERNEL);
> +   pr_info("%s: scs = %p", __func__, scs);
> if (!scs) {
> pr_err("out of memory\n");
> return -ENOMEM;



Forgot to mention that you need to run it in a parallel loop, sorry.

This one should do:

// autogenerated by syzkaller (http://github.com/google/syzkaller)
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

void work()
{
  long r[7];
  memset(r, -1, sizeof(r));
  r[0] = syscall(SYS_mmap, 0x2000ul, 0x1ul, 0x3ul, 0x32ul,
 0xul, 0x0ul);
  r[2] = syscall(SYS_open, "/dev/ptmx", 0x8002ul, 0x0ul, 0, 0, 0);
  *(uint32_t*)0x20002b1e = (uint32_t)0x10;
  r[4] = syscall(SYS_ioctl, r[2], 0x5423ul, 0x20002b1eul, 0, 0, 0);
  *(uint32_t*)0x20009000 = (uint32_t)0x7;
  r[6] = syscall(SYS_ioctl, r[2], 0x5423ul, 0x20009000ul, 0, 0, 0);
}

int main() {
  int running, status;

  for (;;) {
while (running < 32) {
  if (fork() == 0) {
work();
exit(0);
  }
  running++;
}
if (wait() > 0)
  running--;
  }
}


While running it, sample/proc/slabinfo with:

# cat /proc/slabinfo | egrep "^kmalloc-2048"

It constantly grows.

[net-next 17/20] i40e: AQ Add Run PHY Activity struct

2016-02-04 Thread Jeff Kirsher

From: Shannon Nelson 

Add the AQ opcode and struct definitions for the Run PHY Activity command

Signed-off-by: Shannon Nelson 
Acked-by: Kevin Scott 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h   | 13 +
 drivers/net/ethernet/intel/i40evf/i40e_adminq_cmd.h | 13 +
 2 files changed, 26 insertions(+)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h 
b/drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h
index bff0995..9e340ca 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h
@@ -220,6 +220,7 @@ enum i40e_admin_queue_opc {
i40e_aqc_opc_get_phy_wol_caps   = 0x0621,
i40e_aqc_opc_set_phy_debug  = 0x0622,
i40e_aqc_opc_upload_ext_phy_fm  = 0x0625,
+   i40e_aqc_opc_run_phy_activity   = 0x0626,
 
/* NVM commands */
i40e_aqc_opc_nvm_read   = 0x0701,
@@ -1825,6 +1826,18 @@ enum i40e_aq_phy_reg_type {
I40E_AQC_PHY_REG_EXERNAL_MODULE = 0x3
 };
 
+/* Run PHY Activity (0x0626) */
+struct i40e_aqc_run_phy_activity {
+   __le16  activity_id;
+   u8  flags;
+   u8  reserved1;
+   __le32  control;
+   __le32  data;
+   u8  reserved2[4];
+};
+
+I40E_CHECK_CMD_LENGTH(i40e_aqc_run_phy_activity);
+
 /* NVM Read command (indirect 0x0701)
  * NVM Erase commands (direct 0x0702)
  * NVM Update commands (indirect 0x0703)
diff --git a/drivers/net/ethernet/intel/i40evf/i40e_adminq_cmd.h 
b/drivers/net/ethernet/intel/i40evf/i40e_adminq_cmd.h
index 365a7d6..51d83c6 100644
--- a/drivers/net/ethernet/intel/i40evf/i40e_adminq_cmd.h
+++ b/drivers/net/ethernet/intel/i40evf/i40e_adminq_cmd.h
@@ -220,6 +220,7 @@ enum i40e_admin_queue_opc {
i40e_aqc_opc_get_phy_wol_caps   = 0x0621,
i40e_aqc_opc_set_phy_debug  = 0x0622,
i40e_aqc_opc_upload_ext_phy_fm  = 0x0625,
+   i40e_aqc_opc_run_phy_activity   = 0x0626,
 
/* NVM commands */
i40e_aqc_opc_nvm_read   = 0x0701,
@@ -1822,6 +1823,18 @@ enum i40e_aq_phy_reg_type {
I40E_AQC_PHY_REG_EXERNAL_MODULE = 0x3
 };
 
+/* Run PHY Activity (0x0626) */
+struct i40e_aqc_run_phy_activity {
+   __le16  activity_id;
+   u8  flags;
+   u8  reserved1;
+   __le32  control;
+   __le32  data;
+   u8  reserved2[4];
+};
+
+I40E_CHECK_CMD_LENGTH(i40e_aqc_run_phy_activity);
+
 /* NVM Read command (indirect 0x0701)
  * NVM Erase commands (direct 0x0702)
  * NVM Update commands (indirect 0x0703)
-- 
2.5.0

[net-next 12/20] i40e/i40evf: Fix for UDP/TCP RSS for X722

2016-02-04 Thread Jeff Kirsher

From: Anjali Singhai Jain 

The PCTYPES for the X710 and X722 families are different. This patch
makes adjustments for that.

Signed-off-by: Anjali Singhai Jain 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_ethtool.c | 18 ++
 drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c |  6 +
 drivers/net/ethernet/intel/i40evf/i40evf_ethtool.c | 29 +++---
 drivers/net/ethernet/intel/i40evf/i40evf_main.c|  8 +++---
 4 files changed, 54 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c 
b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
index 252a9dd..8a3f93d 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
@@ -2168,6 +2168,10 @@ static int i40e_set_rss_hash_opt(struct i40e_pf *pf, 
struct ethtool_rxnfc *nfc)
case 0:
return -EINVAL;
case (RXH_L4_B_0_1 | RXH_L4_B_2_3):
+   if (pf->flags & I40E_FLAG_MULTIPLE_TCP_UDP_RSS_PCTYPE)
+   hena |=
+  BIT_ULL(I40E_FILTER_PCTYPE_NONF_IPV4_TCP_SYN_NO_ACK);
+
hena |= BIT_ULL(I40E_FILTER_PCTYPE_NONF_IPV4_TCP);
break;
default:
@@ -2179,6 +2183,10 @@ static int i40e_set_rss_hash_opt(struct i40e_pf *pf, 
struct ethtool_rxnfc *nfc)
case 0:
return -EINVAL;
case (RXH_L4_B_0_1 | RXH_L4_B_2_3):
+   if (pf->flags & I40E_FLAG_MULTIPLE_TCP_UDP_RSS_PCTYPE)
+   hena |=
+  BIT_ULL(I40E_FILTER_PCTYPE_NONF_IPV6_TCP_SYN_NO_ACK);
+
hena |= BIT_ULL(I40E_FILTER_PCTYPE_NONF_IPV6_TCP);
break;
default:
@@ -2190,6 +2198,11 @@ static int i40e_set_rss_hash_opt(struct i40e_pf *pf, 
struct ethtool_rxnfc *nfc)
case 0:
return -EINVAL;
case (RXH_L4_B_0_1 | RXH_L4_B_2_3):
+   if (pf->flags & I40E_FLAG_MULTIPLE_TCP_UDP_RSS_PCTYPE)
+   hena |=
+   BIT_ULL(I40E_FILTER_PCTYPE_NONF_UNICAST_IPV4_UDP) |
+   BIT_ULL(I40E_FILTER_PCTYPE_NONF_MULTICAST_IPV4_UDP);
+
hena |= (BIT_ULL(I40E_FILTER_PCTYPE_NONF_IPV4_UDP) |
 BIT_ULL(I40E_FILTER_PCTYPE_FRAG_IPV4));
break;
@@ -2202,6 +2215,11 @@ static int i40e_set_rss_hash_opt(struct i40e_pf *pf, 
struct ethtool_rxnfc *nfc)
case 0:
return -EINVAL;
case (RXH_L4_B_0_1 | RXH_L4_B_2_3):
+   if (pf->flags & I40E_FLAG_MULTIPLE_TCP_UDP_RSS_PCTYPE)
+   hena |=
+   BIT_ULL(I40E_FILTER_PCTYPE_NONF_UNICAST_IPV6_UDP) |
+   BIT_ULL(I40E_FILTER_PCTYPE_NONF_MULTICAST_IPV6_UDP);
+
hena |= (BIT_ULL(I40E_FILTER_PCTYPE_NONF_IPV6_UDP) |
 BIT_ULL(I40E_FILTER_PCTYPE_FRAG_IPV6));
break;
diff --git a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c 
b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
index 63e62f9..86aacb9 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
@@ -1213,6 +1213,12 @@ static int i40e_vc_get_vf_resources_msg(struct i40e_vf 
*vf, u8 *msg)
vfres->vf_offload_flags |= I40E_VIRTCHNL_VF_OFFLOAD_RSS_REG;
}
 
+   if (pf->flags & I40E_FLAG_MULTIPLE_TCP_UDP_RSS_PCTYPE) {
+   if (vf->driver_caps & I40E_VIRTCHNL_VF_OFFLOAD_RSS_PCTYPE_V2)
+   vfres->vf_offload_flags |=
+   I40E_VIRTCHNL_VF_OFFLOAD_RSS_PCTYPE_V2;
+   }
+
if (vf->driver_caps & I40E_VIRTCHNL_VF_OFFLOAD_RX_POLLING)
vfres->vf_offload_flags |= I40E_VIRTCHNL_VF_OFFLOAD_RX_POLLING;
 
diff --git a/drivers/net/ethernet/intel/i40evf/i40evf_ethtool.c 
b/drivers/net/ethernet/intel/i40evf/i40evf_ethtool.c
index 8906785..bd1c272 100644
--- a/drivers/net/ethernet/intel/i40evf/i40evf_ethtool.c
+++ b/drivers/net/ethernet/intel/i40evf/i40evf_ethtool.c
@@ -459,6 +459,7 @@ static int i40evf_set_rss_hash_opt(struct i40evf_adapter 
*adapter,
   struct ethtool_rxnfc *nfc)
 {
struct i40e_hw *hw = >hw;
+   u32 flags = adapter->vf_res->vf_offload_flags;
 
u64 hena = (u64)rd32(hw, I40E_VFQF_HENA(0)) |
   ((u64)rd32(hw, I40E_VFQF_HENA(1)) << 32);
@@ -477,19 +478,34 @@ static int i40evf_set_rss_hash_opt(struct i40evf_adapter 
*adapter,
 
switch (nfc->flow_type) {

[net-next 08/20] i40e: update features with right offload

2016-02-04 Thread Jeff Kirsher

From: Jesse Brandeburg 

Synchronize code bases and add SCTP offload support.

Change-ID: I9f99071f7176225479026930c387bf681a47494e
Signed-off-by: Jesse Brandeburg 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_main.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c 
b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 1a7022c..486ae16 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -8947,11 +8947,11 @@ static int i40e_config_netdev(struct i40e_vsi *vsi)
np = netdev_priv(netdev);
np->vsi = vsi;
 
-   netdev->hw_enc_features |= NETIF_F_IP_CSUM   |
- NETIF_F_RXCSUM |
- NETIF_F_GSO_UDP_TUNNEL |
- NETIF_F_GSO_GRE|
- NETIF_F_TSO;
+   netdev->hw_enc_features |= NETIF_F_IP_CSUM|
+  NETIF_F_GSO_UDP_TUNNEL |
+  NETIF_F_GSO_GRE|
+  NETIF_F_TSO|
+  0;
 
netdev->features = NETIF_F_SG  |
   NETIF_F_IP_CSUM |
-- 
2.5.0

[net-next 06/20] i40evf: null out ring pointers on free

2016-02-04 Thread Jeff Kirsher

From: Mitch Williams 

Since we check these ring pointers to make sure we don't double-allocate
or double-free the rings, we had better null them out after we free
them. In very rare cases this can cause a panic if the driver is removed
during reset recovery.

Change-ID: Ib06eb4910a3058275c8f7ec5ef7f45baa4674f96
Signed-off-by: Mitch Williams 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40evf/i40evf_main.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/intel/i40evf/i40evf_main.c 
b/drivers/net/ethernet/intel/i40evf/i40evf_main.c
index d1c4335..81d9584 100644
--- a/drivers/net/ethernet/intel/i40evf/i40evf_main.c
+++ b/drivers/net/ethernet/intel/i40evf/i40evf_main.c
@@ -1122,7 +1122,9 @@ static void i40evf_free_queues(struct i40evf_adapter 
*adapter)
if (!adapter->vsi_res)
return;
kfree(adapter->tx_rings);
+   adapter->tx_rings = NULL;
kfree(adapter->rx_rings);
+   adapter->rx_rings = NULL;
 }
 
 /**
-- 
2.5.0

[net-next 20/20] i40e: add 100Mb ethtool reporting

2016-02-04 Thread Jeff Kirsher

From: Catherine Sullivan 

Add some missing reporting/advertisement of 100Mb capability
for adapters that support it.

Change-ID: I8b8523fbdc99517bec29d90c71b3744db11542ac
Signed-off-by: Catherine Sullivan 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_ethtool.c | 4 
 drivers/net/ethernet/intel/i40e/i40e_main.c| 3 ++-
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c 
b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
index 8a3f93d..4549591 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
@@ -411,6 +411,10 @@ static void i40e_get_settings_link_down(struct i40e_hw *hw,
if (pf->hw.mac.type == I40E_MAC_X722) {
ecmd->supported |= SUPPORTED_100baseT_Full;
ecmd->advertising |= ADVERTISED_100baseT_Full;
+   if (pf->flags & I40E_FLAG_100M_SGMII_CAPABLE) {
+   ecmd->supported |= SUPPORTED_100baseT_Full;
+   ecmd->advertising |= ADVERTISED_100baseT_Full;
+   }
}
}
if (phy_types & I40E_CAP_PHY_TYPE_XAUI ||
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c 
b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 3e482bc..bad4577 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -8453,7 +8453,8 @@ static int i40e_sw_init(struct i40e_pf *pf)
 I40E_FLAG_OUTER_UDP_CSUM_CAPABLE |
 I40E_FLAG_WB_ON_ITR_CAPABLE |
 I40E_FLAG_MULTIPLE_TCP_UDP_RSS_PCTYPE |
-I40E_FLAG_GENEVE_OFFLOAD_CAPABLE;
+I40E_FLAG_GENEVE_OFFLOAD_CAPABLE |
+I40E_FLAG_100M_SGMII_CAPABLE;
}
pf->eeprom_version = 0xDEAD;
pf->lan_veb = I40E_NO_VEB;
-- 
2.5.0

[net-next 11/20] i40e: Extend ethtool RSS hooks for X722

2016-02-04 Thread Jeff Kirsher

From: Anjali Singhai Jain 

This patch adds another way to access the RSS keys and lut using the AQ
for X722 devices.

Signed-off-by: Anjali Singhai Jain 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_main.c | 53 -
 1 file changed, 52 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c 
b/drivers/net/ethernet/intel/i40e/i40e_main.c
index b3e671b..bd81a97 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -7937,6 +7937,52 @@ static int i40e_vsi_config_rss(struct i40e_vsi *vsi)
 }
 
 /**
+ * i40e_get_rss_aq - Get RSS keys and lut by using AQ commands
+ * @vsi: Pointer to vsi structure
+ * @seed: Buffer to store the hash keys
+ * @lut: Buffer to store the lookup table entries
+ * @lut_size: Size of buffer to store the lookup table entries
+ *
+ * Return 0 on success, negative on failure
+ */
+static int i40e_get_rss_aq(struct i40e_vsi *vsi, const u8 *seed,
+  u8 *lut, u16 lut_size)
+{
+   struct i40e_pf *pf = vsi->back;
+   struct i40e_hw *hw = >hw;
+   int ret = 0;
+
+   if (seed) {
+   ret = i40e_aq_get_rss_key(hw, vsi->id,
+   (struct i40e_aqc_get_set_rss_key_data *)seed);
+   if (ret) {
+   dev_info(>pdev->dev,
+"Cannot get RSS key, err %s aq_err %s\n",
+i40e_stat_str(>hw, ret),
+i40e_aq_str(>hw,
+pf->hw.aq.asq_last_status));
+   return ret;
+   }
+   }
+
+   if (lut) {
+   bool pf_lut = vsi->type == I40E_VSI_MAIN ? true : false;
+
+   ret = i40e_aq_get_rss_lut(hw, vsi->id, pf_lut, lut, lut_size);
+   if (ret) {
+   dev_info(>pdev->dev,
+"Cannot get RSS lut, err %s aq_err %s\n",
+i40e_stat_str(>hw, ret),
+i40e_aq_str(>hw,
+pf->hw.aq.asq_last_status));
+   return ret;
+   }
+   }
+
+   return ret;
+}
+
+/**
  * i40e_config_rss_reg - Configure RSS keys and lut by writing registers
  * @vsi: Pointer to vsi structure
  * @seed: RSS hash seed
@@ -8038,7 +8084,12 @@ int i40e_config_rss(struct i40e_vsi *vsi, u8 *seed, u8 
*lut, u16 lut_size)
  */
 int i40e_get_rss(struct i40e_vsi *vsi, u8 *seed, u8 *lut, u16 lut_size)
 {
-   return i40e_get_rss_reg(vsi, seed, lut, lut_size);
+   struct i40e_pf *pf = vsi->back;
+
+   if (pf->flags & I40E_FLAG_RSS_AQ_CAPABLE)
+   return i40e_get_rss_aq(vsi, seed, lut, lut_size);
+   else
+   return i40e_get_rss_reg(vsi, seed, lut, lut_size);
 }
 
 /**
-- 
2.5.0

[net-next 07/20] i40e: Cleanup the code with respect to restarting autoneg

2016-02-04 Thread Jeff Kirsher

From: Anjali Singhai Jain 

The restart-autoneg work around does not apply to X722.
Added a flag to set it only for the right MAC and right FW version
where the work around should be applied.

Signed-off-by: Anjali Singhai Jain 
Change-ID: I942c3ff40cccd1e56f424b1da776b020fe3c9d2a
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e.h  |  1 +
 drivers/net/ethernet/intel/i40e/i40e_main.c | 12 
 2 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e.h 
b/drivers/net/ethernet/intel/i40e/i40e.h
index 47f6c0a..53ed3bd 100644
--- a/drivers/net/ethernet/intel/i40e/i40e.h
+++ b/drivers/net/ethernet/intel/i40e/i40e.h
@@ -340,6 +340,7 @@ struct i40e_pf {
 #define I40E_FLAG_GENEVE_OFFLOAD_CAPABLE   BIT_ULL(41)
 #define I40E_FLAG_NO_PCI_LINK_CHECKBIT_ULL(42)
 #define I40E_FLAG_100M_SGMII_CAPABLE   BIT_ULL(43)
+#define I40E_FLAG_RESTART_AUTONEG  BIT_ULL(44)
 #define I40E_FLAG_PF_MAC   BIT_ULL(50)
 
/* tracks features that get auto disabled by errors */
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c 
b/drivers/net/ethernet/intel/i40e/i40e_main.c
index d078a63..1a7022c 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -6889,8 +6889,7 @@ static void i40e_reset_and_rebuild(struct i40e_pf *pf, 
bool reinit)
wr32(hw, I40E_REG_MSS, val);
}
 
-   if (((pf->hw.aq.fw_maj_ver == 4) && (pf->hw.aq.fw_min_ver < 33)) ||
-   (pf->hw.aq.fw_maj_ver < 4)) {
+   if (pf->flags & I40E_FLAG_RESTART_AUTONEG) {
msleep(75);
ret = i40e_aq_set_link_restart_an(>hw, true, NULL);
if (ret)
@@ -8367,6 +8366,12 @@ static int i40e_sw_init(struct i40e_pf *pf)
 pf->hw.func_caps.fd_filters_best_effort;
}
 
+   if (((pf->hw.mac.type == I40E_MAC_X710) ||
+(pf->hw.mac.type == I40E_MAC_XL710)) &&
+   (((pf->hw.aq.fw_maj_ver == 4) && (pf->hw.aq.fw_min_ver < 33)) ||
+   (pf->hw.aq.fw_maj_ver < 4)))
+   pf->flags |= I40E_FLAG_RESTART_AUTONEG;
+
if (pf->hw.func_caps.vmdq) {
pf->num_vmdq_vsis = I40E_DEFAULT_NUM_VMDQ_VSI;
pf->flags |= I40E_FLAG_VMDQ_ENABLED;
@@ -10904,8 +10909,7 @@ static int i40e_probe(struct pci_dev *pdev, const 
struct pci_device_id *ent)
wr32(hw, I40E_REG_MSS, val);
}
 
-   if (((pf->hw.aq.fw_maj_ver == 4) && (pf->hw.aq.fw_min_ver < 33)) ||
-   (pf->hw.aq.fw_maj_ver < 4)) {
+   if (pf->flags & I40E_FLAG_RESTART_AUTONEG) {
msleep(75);
err = i40e_aq_set_link_restart_an(>hw, true, NULL);
if (err)
-- 
2.5.0

[net-next 04/20] i40evf: allow channel bonding of VFs

2016-02-04 Thread Jeff Kirsher

From: Mitch Williams 

In some modes, bonding would not enslave VF interfaces. This is due to
bonding calling change_mtu and the immediately calling open. Because of
the asynchronous nature of the admin queue mechanism, the VF returns
-EBUSY to the open call, because it knows the previous operation hasn't
finished yet. This causes bonding to fail with a less-than-useful error
message.

To fix this, remove the check for pending operations at the beginning of
open. But this introduces a new bug where the driver will panic on a
quick close/open cycle. To fix that, we add a new driver state,
__I40EVF_DOWN_PENDING, that the driver enters when down is called. The
driver finally transitions to a fully DOWN state when it receives
confirmation from the PF driver that all the queues are disabled. This
allows open to complete even if there is a pending mtu change, and
bonding is finally happy.

Change-ID: I06f4c7e435d5bacbfceaa7c3f209e0ff04be21cc
Signed-off-by: Mitch Williams 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40evf/i40evf.h  | 1 +
 drivers/net/ethernet/intel/i40evf/i40evf_main.c | 9 +
 drivers/net/ethernet/intel/i40evf/i40evf_virtchnl.c | 2 ++
 3 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40evf/i40evf.h 
b/drivers/net/ethernet/intel/i40evf/i40evf.h
index be1b72b..9e15f68 100644
--- a/drivers/net/ethernet/intel/i40evf/i40evf.h
+++ b/drivers/net/ethernet/intel/i40evf/i40evf.h
@@ -173,6 +173,7 @@ enum i40evf_state_t {
__I40EVF_RESETTING, /* in reset */
/* Below here, watchdog is running */
__I40EVF_DOWN,  /* ready, can be opened */
+   __I40EVF_DOWN_PENDING,  /* descending, waiting for watchdog */
__I40EVF_TESTING,   /* in ethtool self-test */
__I40EVF_RUNNING,   /* opened, working */
 };
diff --git a/drivers/net/ethernet/intel/i40evf/i40evf_main.c 
b/drivers/net/ethernet/intel/i40evf/i40evf_main.c
index 94da913..d1c4335 100644
--- a/drivers/net/ethernet/intel/i40evf/i40evf_main.c
+++ b/drivers/net/ethernet/intel/i40evf/i40evf_main.c
@@ -1032,7 +1032,7 @@ void i40evf_down(struct i40evf_adapter *adapter)
struct net_device *netdev = adapter->netdev;
struct i40evf_mac_filter *f;
 
-   if (adapter->state == __I40EVF_DOWN)
+   if (adapter->state <= __I40EVF_DOWN_PENDING)
return;
 
while (test_and_set_bit(__I40EVF_IN_CRITICAL_TASK,
@@ -2142,7 +2142,8 @@ static int i40evf_open(struct net_device *netdev)
dev_err(>pdev->dev, "Unable to open device due to PF 
driver failure.\n");
return -EIO;
}
-   if (adapter->state != __I40EVF_DOWN || adapter->aq_required)
+
+   if (adapter->state != __I40EVF_DOWN)
return -EBUSY;
 
/* allocate transmit descriptors */
@@ -2197,14 +2198,14 @@ static int i40evf_close(struct net_device *netdev)
 {
struct i40evf_adapter *adapter = netdev_priv(netdev);
 
-   if (adapter->state <= __I40EVF_DOWN)
+   if (adapter->state <= __I40EVF_DOWN_PENDING)
return 0;
 
 
set_bit(__I40E_DOWN, >vsi.state);
 
i40evf_down(adapter);
-   adapter->state = __I40EVF_DOWN;
+   adapter->state = __I40EVF_DOWN_PENDING;
i40evf_free_traffic_irqs(adapter);
 
return 0;
diff --git a/drivers/net/ethernet/intel/i40evf/i40evf_virtchnl.c 
b/drivers/net/ethernet/intel/i40evf/i40evf_virtchnl.c
index c1c5262..d3739cc 100644
--- a/drivers/net/ethernet/intel/i40evf/i40evf_virtchnl.c
+++ b/drivers/net/ethernet/intel/i40evf/i40evf_virtchnl.c
@@ -804,6 +804,8 @@ void i40evf_virtchnl_completion(struct i40evf_adapter 
*adapter,
case I40E_VIRTCHNL_OP_DISABLE_QUEUES:
i40evf_free_all_tx_resources(adapter);
i40evf_free_all_rx_resources(adapter);
+   if (adapter->state == __I40EVF_DOWN_PENDING)
+   adapter->state = __I40EVF_DOWN;
break;
case I40E_VIRTCHNL_OP_VERSION:
case I40E_VIRTCHNL_OP_CONFIG_IRQ_MAP:
-- 
2.5.0

[net-next 15/20] i40e: add new proxy-wol bit for X722

2016-02-04 Thread Jeff Kirsher

From: Shannon Nelson 

Add the new proxy-wake-on-lan capability bit available with the
new X722 device.

Signed-off-by: Shannon Nelson 
Acked-by: Jesse Brandeburg 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h   | 1 +
 drivers/net/ethernet/intel/i40evf/i40e_adminq_cmd.h | 1 +
 2 files changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h 
b/drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h
index 256ce65..bff0995 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h
@@ -402,6 +402,7 @@ struct i40e_aqc_list_capabilities_element_resp {
 #define I40E_AQ_CAP_ID_OS2BMC_CAP  0x0004
 #define I40E_AQ_CAP_ID_FUNCTIONS_VALID 0x0005
 #define I40E_AQ_CAP_ID_ALTERNATE_RAM   0x0006
+#define I40E_AQ_CAP_ID_WOL_AND_PROXY   0x0008
 #define I40E_AQ_CAP_ID_SRIOV   0x0012
 #define I40E_AQ_CAP_ID_VF  0x0013
 #define I40E_AQ_CAP_ID_VMDQ0x0014
diff --git a/drivers/net/ethernet/intel/i40evf/i40e_adminq_cmd.h 
b/drivers/net/ethernet/intel/i40evf/i40e_adminq_cmd.h
index 0d3bc3b..365a7d6 100644
--- a/drivers/net/ethernet/intel/i40evf/i40e_adminq_cmd.h
+++ b/drivers/net/ethernet/intel/i40evf/i40e_adminq_cmd.h
@@ -399,6 +399,7 @@ struct i40e_aqc_list_capabilities_element_resp {
 #define I40E_AQ_CAP_ID_OS2BMC_CAP  0x0004
 #define I40E_AQ_CAP_ID_FUNCTIONS_VALID 0x0005
 #define I40E_AQ_CAP_ID_ALTERNATE_RAM   0x0006
+#define I40E_AQ_CAP_ID_WOL_AND_PROXY   0x0008
 #define I40E_AQ_CAP_ID_SRIOV   0x0012
 #define I40E_AQ_CAP_ID_VF  0x0013
 #define I40E_AQ_CAP_ID_VMDQ0x0014
-- 
2.5.0

[net-next 13/20] i40evf: add new write-back mode

2016-02-04 Thread Jeff Kirsher

From: Anjali Singhai Jain 

Add write-back on interrupt throttle rate timer expiration support
for the i40evf driver, when running on X722 devices.

Signed-off-by: Anjali Singhai Jain 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c |  6 ++
 drivers/net/ethernet/intel/i40evf/i40e_txrx.c  | 16 
 drivers/net/ethernet/intel/i40evf/i40evf_main.c|  5 +
 3 files changed, 27 insertions(+)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c 
b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
index 86aacb9..659d782 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
@@ -1222,6 +1222,12 @@ static int i40e_vc_get_vf_resources_msg(struct i40e_vf 
*vf, u8 *msg)
if (vf->driver_caps & I40E_VIRTCHNL_VF_OFFLOAD_RX_POLLING)
vfres->vf_offload_flags |= I40E_VIRTCHNL_VF_OFFLOAD_RX_POLLING;
 
+   if (pf->flags & I40E_FLAG_WB_ON_ITR_CAPABLE) {
+   if (vf->driver_caps & I40E_VIRTCHNL_VF_OFFLOAD_WB_ON_ITR)
+   vfres->vf_offload_flags |=
+   I40E_VIRTCHNL_VF_OFFLOAD_WB_ON_ITR;
+   }
+
vfres->num_vsis = num_vsis;
vfres->num_queue_pairs = vf->num_queue_pairs;
vfres->max_vectors = pf->hw.func_caps.num_msix_vectors_vf;
diff --git a/drivers/net/ethernet/intel/i40evf/i40e_txrx.c 
b/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
index 7a00657..7d663fb 100644
--- a/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
@@ -252,6 +252,22 @@ static bool i40e_clean_tx_irq(struct i40e_ring *tx_ring, 
int budget)
tx_ring->q_vector->tx.total_bytes += total_bytes;
tx_ring->q_vector->tx.total_packets += total_packets;
 
+   if (tx_ring->flags & I40E_TXR_FLAGS_WB_ON_ITR) {
+   unsigned int j = 0;
+   /* check to see if there are < 4 descriptors
+* waiting to be written back, then kick the hardware to force
+* them to be written back in case we stay in NAPI.
+* In this mode on X722 we do not enable Interrupt.
+*/
+   j = i40evf_get_tx_pending(tx_ring);
+
+   if (budget &&
+   ((j / (WB_STRIDE + 1)) == 0) && (j > 0) &&
+   !test_bit(__I40E_DOWN, _ring->vsi->state) &&
+   (I40E_DESC_UNUSED(tx_ring) != tx_ring->count))
+   tx_ring->arm_wb = true;
+   }
+
netdev_tx_completed_queue(netdev_get_tx_queue(tx_ring->netdev,
  tx_ring->queue_index),
  total_packets, total_bytes);
diff --git a/drivers/net/ethernet/intel/i40evf/i40evf_main.c 
b/drivers/net/ethernet/intel/i40evf/i40evf_main.c
index 798f0de..615ad0f 100644
--- a/drivers/net/ethernet/intel/i40evf/i40evf_main.c
+++ b/drivers/net/ethernet/intel/i40evf/i40evf_main.c
@@ -2511,6 +2511,11 @@ static void i40evf_init_task(struct work_struct *work)
if (adapter->vf_res->vf_offload_flags &
I40E_VIRTCHNL_VF_OFFLOAD_WB_ON_ITR)
adapter->flags |= I40EVF_FLAG_WB_ON_ITR_CAPABLE;
+
+   if (adapter->vf_res->vf_offload_flags &
+   I40E_VIRTCHNL_VF_OFFLOAD_WB_ON_ITR)
+   adapter->flags |= I40EVF_FLAG_WB_ON_ITR_CAPABLE;
+
err = i40evf_request_misc_irq(adapter);
if (err)
goto err_sw_init;
-- 
2.5.0

[net-next 10/20] i40e: add new device IDs for X722

2016-02-04 Thread Jeff Kirsher

From: Anjali Singhai Jain 

Add the KX and QSFP device IDs for X722.

Signed-off-by: Anjali Singhai Jain 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_common.c | 2 ++
 drivers/net/ethernet/intel/i40e/i40e_devids.h | 2 ++
 drivers/net/ethernet/intel/i40e/i40e_main.c   | 2 ++
 3 files changed, 6 insertions(+)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_common.c 
b/drivers/net/ethernet/intel/i40e/i40e_common.c
index 4bdb08b..3b03a31 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_common.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_common.c
@@ -55,6 +55,8 @@ static i40e_status i40e_set_mac_type(struct i40e_hw *hw)
case I40E_DEV_ID_20G_KR2_A:
hw->mac.type = I40E_MAC_XL710;
break;
+   case I40E_DEV_ID_KX_X722:
+   case I40E_DEV_ID_QSFP_X722:
case I40E_DEV_ID_SFP_X722:
case I40E_DEV_ID_1G_BASE_T_X722:
case I40E_DEV_ID_10G_BASE_T_X722:
diff --git a/drivers/net/ethernet/intel/i40e/i40e_devids.h 
b/drivers/net/ethernet/intel/i40e/i40e_devids.h
index 448ef4c..f7ce5c7 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_devids.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_devids.h
@@ -41,6 +41,8 @@
 #define I40E_DEV_ID_10G_BASE_T40x1589
 #define I40E_DEV_ID_VF 0x154C
 #define I40E_DEV_ID_VF_HV  0x1571
+#define I40E_DEV_ID_KX_X7220x37CE
+#define I40E_DEV_ID_QSFP_X722  0x37CF
 #define I40E_DEV_ID_SFP_X722   0x37D0
 #define I40E_DEV_ID_1G_BASE_T_X722 0x37D1
 #define I40E_DEV_ID_10G_BASE_T_X7220x37D2
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c 
b/drivers/net/ethernet/intel/i40e/i40e_main.c
index c88583e..b3e671b 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -90,6 +90,8 @@ static const struct pci_device_id i40e_pci_tbl[] = {
{PCI_VDEVICE(INTEL, I40E_DEV_ID_10G_BASE_T), 0},
{PCI_VDEVICE(INTEL, I40E_DEV_ID_10G_BASE_T4), 0},
{PCI_VDEVICE(INTEL, I40E_DEV_ID_20G_KR2), 0},
+   {PCI_VDEVICE(INTEL, I40E_DEV_ID_KX_X722), 0},
+   {PCI_VDEVICE(INTEL, I40E_DEV_ID_QSFP_X722), 0},
{PCI_VDEVICE(INTEL, I40E_DEV_ID_SFP_X722), 0},
{PCI_VDEVICE(INTEL, I40E_DEV_ID_1G_BASE_T_X722), 0},
{PCI_VDEVICE(INTEL, I40E_DEV_ID_10G_BASE_T_X722), 0},
-- 
2.5.0

Re: [PATCH net-next v5 2/2] virtio_net: add ethtool support for set and get of settings

2016-02-04 Thread Michael S. Tsirkin

On Wed, Feb 03, 2016 at 10:19:04AM +0100, Nikolay Aleksandrov wrote:
> On 02/03/2016 04:04 AM, Nikolay Aleksandrov wrote:
> > From: Nikolay Aleksandrov 
> > 
> > This patch allows the user to set and retrieve speed and duplex of the
> > virtio_net device via ethtool. Having this functionality is very helpful
> > for simulating different environments and also enables the virtio_net
> > device to participate in operations where proper speed and duplex are
> > required (e.g. currently bonding lacp mode requires full duplex). Custom
> > speed and duplex are not allowed, the user-supplied settings are validated
> > before applying.
> > 
> > Example:
> > $ ethtool eth1
> > Settings for eth1:
> > ...
> > Speed: Unknown!
> > Duplex: Unknown! (255)
> > $ ethtool -s eth1 speed 1000 duplex full
> > $ ethtool eth1
> > Settings for eth1:
> > ...
> > Speed: 1000Mb/s
> > Duplex: Full
> > 
> > Based on a patch by Roopa Prabhu.
> > 
> > Signed-off-by: Nikolay Aleksandrov 
> > ---
> > v2: use the new ethtool speed/duplex validation functions and allow half
> > duplex to be set
> > v3: return error if the user tries to change anything besides speed/duplex
> > as per Michael's comment
> > We have to zero-out advertising as it gets set automatically by ethtool if
> > setting speed and duplex together.
> > v4: Set port type to PORT_OTHER
> > v5: null diff1.port because we set cmd->port now and ethtool returns it in
> > the set request, retested all cases
> > 
> 
> Hmm, nulling the advertising and ->port completely ignores them, i.e. won't 
> produce
> an error if the user actually specified a different value for either of them.
> We can check if the ->port matches what we returned, but there's no fix for
> advertising. I'm leaving both ignored for now, please let me know if you'd
> prefer otherwise.
> 
> Thanks,
>  Nik

I think I prefer validating port.
For advertising we don't allow enabling autonegotiation so ignoring
these is fine I think.

-- 
MST

[PATCH v3 4/4] ipv6: add option to drop unsolicited neighbor advertisements

2016-02-04 Thread Johannes Berg

From: Johannes Berg 

In certain 802.11 wireless deployments, there will be NA proxies
that use knowledge of the network to correctly answer requests.
To prevent unsolicitd advertisements on the shared medium from
being a problem, on such deployments wireless needs to drop them.

Enable this by providing an option called "drop_unsolicited_na".

Signed-off-by: Johannes Berg 
---
 Documentation/networking/ip-sysctl.txt | 7 +++
 include/linux/ipv6.h   | 1 +
 include/uapi/linux/ipv6.h  | 1 +
 net/ipv6/addrconf.c| 8 
 net/ipv6/ndisc.c   | 9 +
 5 files changed, 26 insertions(+)

diff --git a/Documentation/networking/ip-sysctl.txt 
b/Documentation/networking/ip-sysctl.txt
index 428fb48a19fc..77992f1173c3 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -1672,6 +1672,13 @@ drop_unicast_in_l2_multicast - BOOLEAN
 
By default this is turned off.
 
+drop_unsolicited_na - BOOLEAN
+   Drop all unsolicited neighbor advertisements, for example if there's
+   a known good NA proxy on the network and such frames need not be used
+   (or in the case of 802.11, must not be used to prevent attacks.)
+
+   By default this is turned off.
+
 icmp/*:
 ratelimit - INTEGER
Limit the maximal rates for sending ICMPv6 packets.
diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
index 34317cb6a6fc..9231bfdc7c92 100644
--- a/include/linux/ipv6.h
+++ b/include/linux/ipv6.h
@@ -56,6 +56,7 @@ struct ipv6_devconf {
__s32   ndisc_notify;
__s32   suppress_frag_ndisc;
__s32   accept_ra_mtu;
+   __s32   drop_unsolicited_na;
struct ipv6_stable_secret {
bool initialized;
struct in6_addr secret;
diff --git a/include/uapi/linux/ipv6.h b/include/uapi/linux/ipv6.h
index 4c413570efe8..ec117b65d5a5 100644
--- a/include/uapi/linux/ipv6.h
+++ b/include/uapi/linux/ipv6.h
@@ -175,6 +175,7 @@ enum {
DEVCONF_ACCEPT_RA_MIN_HOP_LIMIT,
DEVCONF_IGNORE_ROUTES_WITH_LINKDOWN,
DEVCONF_DROP_UNICAST_IN_L2_MULTICAST,
+   DEVCONF_DROP_UNSOLICITED_NA,
DEVCONF_MAX
 };
 
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 35f880bcf626..e7dd0a0c5126 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -4673,6 +4673,7 @@ static inline void ipv6_store_devconf(struct ipv6_devconf 
*cnf,
/* we omit DEVCONF_STABLE_SECRET for now */
array[DEVCONF_USE_OIF_ADDRS_ONLY] = cnf->use_oif_addrs_only;
array[DEVCONF_DROP_UNICAST_IN_L2_MULTICAST] = 
cnf->drop_unicast_in_l2_multicast;
+   array[DEVCONF_DROP_UNSOLICITED_NA] = cnf->drop_unsolicited_na;
 }
 
 static inline size_t inet6_ifla6_size(void)
@@ -5742,6 +5743,13 @@ static struct addrconf_sysctl_table
.proc_handler   = proc_dointvec,
},
{
+   .procname   = "drop_unsolicited_na",
+   .data   = _devconf.drop_unsolicited_na,
+   .maxlen = sizeof(int),
+   .mode   = 0644,
+   .proc_handler   = proc_dointvec,
+   },
+   {
/* sentinel */
}
},
diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c
index 3e0f855e1bea..12c84a53df4f 100644
--- a/net/ipv6/ndisc.c
+++ b/net/ipv6/ndisc.c
@@ -887,6 +887,7 @@ static void ndisc_recv_na(struct sk_buff *skb)
offsetof(struct nd_msg, opt));
struct ndisc_options ndopts;
struct net_device *dev = skb->dev;
+   struct inet6_dev *idev = __in6_dev_get(dev);
struct inet6_ifaddr *ifp;
struct neighbour *neigh;
 
@@ -906,6 +907,14 @@ static void ndisc_recv_na(struct sk_buff *skb)
return;
}
 
+   /* For some 802.11 wireless deployments (and possibly other networks),
+* there will be a NA proxy and unsolicitd packets are attacks
+* and thus should not be accepted.
+*/
+   if (!msg->icmph.icmp6_solicited && idev &&
+   idev->cnf.drop_unsolicited_na)
+   return;
+
if (!ndisc_parse_options(msg->opt, ndoptlen, )) {
ND_PRINTK(2, warn, "NS: invalid ND option\n");
return;
-- 
2.7.0

[PATCH v3 3/4] ipv6: add option to drop unicast encapsulated in L2 multicast

2016-02-04 Thread Johannes Berg

From: Johannes Berg 

In order to solve a problem with 802.11, the so-called hole-196 attack,
add an option (sysctl) called "drop_unicast_in_l2_multicast" which, if
enabled, causes the stack to drop IPv6 unicast packets encapsulated in
link-layer multi- or broadcast frames. Such frames can (as an attack)
be created by any member of the same wireless network and transmitted
as valid encrypted frames since the symmetric key for broadcast frames
is shared between all stations.

Reviewed-by: Julian Anastasov 
Signed-off-by: Johannes Berg 
---
 Documentation/networking/ip-sysctl.txt |  6 ++
 include/linux/ipv6.h   |  1 +
 include/uapi/linux/ipv6.h  |  1 +
 net/ipv6/addrconf.c|  8 
 net/ipv6/ip6_input.c   | 10 ++
 5 files changed, 26 insertions(+)

diff --git a/Documentation/networking/ip-sysctl.txt 
b/Documentation/networking/ip-sysctl.txt
index 56bb6dd881bd..428fb48a19fc 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -1666,6 +1666,12 @@ stable_secret - IPv6 address
 
By default the stable secret is unset.
 
+drop_unicast_in_l2_multicast - BOOLEAN
+   Drop any unicast IPv6 packets that are received in link-layer
+   multicast (or broadcast) frames.
+
+   By default this is turned off.
+
 icmp/*:
 ratelimit - INTEGER
Limit the maximal rates for sending ICMPv6 packets.
diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
index 0ef2a97ccdb5..34317cb6a6fc 100644
--- a/include/linux/ipv6.h
+++ b/include/linux/ipv6.h
@@ -50,6 +50,7 @@ struct ipv6_devconf {
__s32   mc_forwarding;
 #endif
__s32   disable_ipv6;
+   __s32   drop_unicast_in_l2_multicast;
__s32   accept_dad;
__s32   force_tllao;
__s32   ndisc_notify;
diff --git a/include/uapi/linux/ipv6.h b/include/uapi/linux/ipv6.h
index 38b4fef20219..4c413570efe8 100644
--- a/include/uapi/linux/ipv6.h
+++ b/include/uapi/linux/ipv6.h
@@ -174,6 +174,7 @@ enum {
DEVCONF_USE_OIF_ADDRS_ONLY,
DEVCONF_ACCEPT_RA_MIN_HOP_LIMIT,
DEVCONF_IGNORE_ROUTES_WITH_LINKDOWN,
+   DEVCONF_DROP_UNICAST_IN_L2_MULTICAST,
DEVCONF_MAX
 };
 
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index d72fa90d6feb..35f880bcf626 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -4672,6 +4672,7 @@ static inline void ipv6_store_devconf(struct ipv6_devconf 
*cnf,
array[DEVCONF_IGNORE_ROUTES_WITH_LINKDOWN] = 
cnf->ignore_routes_with_linkdown;
/* we omit DEVCONF_STABLE_SECRET for now */
array[DEVCONF_USE_OIF_ADDRS_ONLY] = cnf->use_oif_addrs_only;
+   array[DEVCONF_DROP_UNICAST_IN_L2_MULTICAST] = 
cnf->drop_unicast_in_l2_multicast;
 }
 
 static inline size_t inet6_ifla6_size(void)
@@ -5734,6 +5735,13 @@ static struct addrconf_sysctl_table
.proc_handler   = 
addrconf_sysctl_ignore_routes_with_linkdown,
},
{
+   .procname   = "drop_unicast_in_l2_multicast",
+   .data   = 
_devconf.drop_unicast_in_l2_multicast,
+   .maxlen = sizeof(int),
+   .mode   = 0644,
+   .proc_handler   = proc_dointvec,
+   },
+   {
/* sentinel */
}
},
diff --git a/net/ipv6/ip6_input.c b/net/ipv6/ip6_input.c
index 9075acf081dd..31ac3c56da4b 100644
--- a/net/ipv6/ip6_input.c
+++ b/net/ipv6/ip6_input.c
@@ -134,6 +134,16 @@ int ipv6_rcv(struct sk_buff *skb, struct net_device *dev, 
struct packet_type *pt
IPV6_ADDR_MC_SCOPE(>daddr) == 1)
goto err;
 
+   /* If enabled, drop unicast packets that were encapsulated in link-layer
+* multicast or broadcast to protected against the so-called "hole-196"
+* attack in 802.11 wireless.
+*/
+   if (!ipv6_addr_is_multicast(>daddr) &&
+   (skb->pkt_type == PACKET_BROADCAST ||
+skb->pkt_type == PACKET_MULTICAST) &&
+   idev->cnf.drop_unicast_in_l2_multicast)
+   goto err;
+
/* RFC4291 2.7
 * Nodes must not originate a packet to a multicast address whose scope
 * field contains the reserved value 0; if such a packet is received, it
-- 
2.7.0

[PATCH v3 1/4] ipv4: add option to drop unicast encapsulated in L2 multicast

2016-02-04 Thread Johannes Berg

From: Johannes Berg 

In order to solve a problem with 802.11, the so-called hole-196 attack,
add an option (sysctl) called "drop_unicast_in_l2_multicast" which, if
enabled, causes the stack to drop IPv4 unicast packets encapsulated in
link-layer multi- or broadcast frames. Such frames can (as an attack)
be created by any member of the same wireless network and transmitted
as valid encrypted frames since the symmetric key for broadcast frames
is shared between all stations.

Additionally, enabling this option provides compliance with a SHOULD
clause of RFC 1122.

Reviewed-by: Julian Anastasov 
Signed-off-by: Johannes Berg 
---
 Documentation/networking/ip-sysctl.txt |  7 +++
 include/uapi/linux/ip.h|  1 +
 net/ipv4/devinet.c |  2 ++
 net/ipv4/ip_input.c| 25 -
 4 files changed, 34 insertions(+), 1 deletion(-)

diff --git a/Documentation/networking/ip-sysctl.txt 
b/Documentation/networking/ip-sysctl.txt
index 05915be86235..35c4c43dd8de 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -1208,6 +1208,13 @@ promote_secondaries - BOOLEAN
promote a corresponding secondary IP address instead of
removing all the corresponding secondary IP addresses.
 
+drop_unicast_in_l2_multicast - BOOLEAN
+   Drop any unicast IP packets that are received in link-layer
+   multicast (or broadcast) frames.
+   This behavior (for multicast) is actually a SHOULD in RFC
+   1122, but is disabled by default for compatibility reasons.
+   Default: off (0)
+
 
 tag - INTEGER
Allows you to write a number, which can be used as required.
diff --git a/include/uapi/linux/ip.h b/include/uapi/linux/ip.h
index 08f894d2ddbd..584834f7e95c 100644
--- a/include/uapi/linux/ip.h
+++ b/include/uapi/linux/ip.h
@@ -165,6 +165,7 @@ enum
IPV4_DEVCONF_IGMPV2_UNSOLICITED_REPORT_INTERVAL,
IPV4_DEVCONF_IGMPV3_UNSOLICITED_REPORT_INTERVAL,
IPV4_DEVCONF_IGNORE_ROUTES_WITH_LINKDOWN,
+   IPV4_DEVCONF_DROP_UNICAST_IN_L2_MULTICAST,
__IPV4_DEVCONF_MAX
 };
 
diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
index cebd9d31e65a..dbbab28a52a4 100644
--- a/net/ipv4/devinet.c
+++ b/net/ipv4/devinet.c
@@ -2192,6 +2192,8 @@ static struct devinet_sysctl_table {
  "promote_secondaries"),
DEVINET_SYSCTL_FLUSHING_ENTRY(ROUTE_LOCALNET,
  "route_localnet"),
+   DEVINET_SYSCTL_FLUSHING_ENTRY(DROP_UNICAST_IN_L2_MULTICAST,
+ "drop_unicast_in_l2_multicast"),
},
 };
 
diff --git a/net/ipv4/ip_input.c b/net/ipv4/ip_input.c
index b1209b63381f..997ef64a1c0b 100644
--- a/net/ipv4/ip_input.c
+++ b/net/ipv4/ip_input.c
@@ -359,8 +359,31 @@ static int ip_rcv_finish(struct net *net, struct sock *sk, 
struct sk_buff *skb)
rt = skb_rtable(skb);
if (rt->rt_type == RTN_MULTICAST) {
IP_UPD_PO_STATS_BH(net, IPSTATS_MIB_INMCAST, skb->len);
-   } else if (rt->rt_type == RTN_BROADCAST)
+   } else if (rt->rt_type == RTN_BROADCAST) {
IP_UPD_PO_STATS_BH(net, IPSTATS_MIB_INBCAST, skb->len);
+   } else if (skb->pkt_type == PACKET_BROADCAST ||
+  skb->pkt_type == PACKET_MULTICAST) {
+   struct in_device *in_dev = __in_dev_get_rcu(skb->dev);
+
+   /* RFC 1122 3.3.6:
+*
+*   When a host sends a datagram to a link-layer broadcast
+*   address, the IP destination address MUST be a legal IP
+*   broadcast or IP multicast address.
+*
+*   A host SHOULD silently discard a datagram that is received
+*   via a link-layer broadcast (see Section 2.4) but does not
+*   specify an IP multicast or broadcast destination address.
+*
+* This doesn't explicitly say L2 *broadcast*, but broadcast is
+* in a way a form of multicast and the most common use case for
+* this is 802.11 protecting against cross-station spoofing (the
+* so-called "hole-196" attack) so do it for both.
+*/
+   if (in_dev &&
+   IN_DEV_ORCONF(in_dev, DROP_UNICAST_IN_L2_MULTICAST))
+   goto drop;
+   }
 
return dst_input(skb);
 
-- 
2.7.0

[PATCH v3 2/4] ipv4: add option to drop gratuitous ARP packets

2016-02-04 Thread Johannes Berg

From: Johannes Berg 

In certain 802.11 wireless deployments, there will be ARP proxies
that use knowledge of the network to correctly answer requests.
To prevent gratuitous ARP frames on the shared medium from being
a problem, on such deployments wireless needs to drop them.

Enable this by providing an option called "drop_gratuitous_arp".

Signed-off-by: Johannes Berg 
---
 Documentation/networking/ip-sysctl.txt | 6 ++
 include/uapi/linux/ip.h| 1 +
 net/ipv4/arp.c | 8 
 net/ipv4/devinet.c | 2 ++
 4 files changed, 17 insertions(+)

diff --git a/Documentation/networking/ip-sysctl.txt 
b/Documentation/networking/ip-sysctl.txt
index 35c4c43dd8de..56bb6dd881bd 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -1215,6 +1215,12 @@ drop_unicast_in_l2_multicast - BOOLEAN
1122, but is disabled by default for compatibility reasons.
Default: off (0)
 
+drop_gratuitous_arp - BOOLEAN
+   Drop all gratuitous ARP frames, for example if there's a known
+   good ARP proxy on the network and such frames need not be used
+   (or in the case of 802.11, must not be used to prevent attacks.)
+   Default: off (0)
+
 
 tag - INTEGER
Allows you to write a number, which can be used as required.
diff --git a/include/uapi/linux/ip.h b/include/uapi/linux/ip.h
index 584834f7e95c..f291569768dd 100644
--- a/include/uapi/linux/ip.h
+++ b/include/uapi/linux/ip.h
@@ -166,6 +166,7 @@ enum
IPV4_DEVCONF_IGMPV3_UNSOLICITED_REPORT_INTERVAL,
IPV4_DEVCONF_IGNORE_ROUTES_WITH_LINKDOWN,
IPV4_DEVCONF_DROP_UNICAST_IN_L2_MULTICAST,
+   IPV4_DEVCONF_DROP_GRATUITOUS_ARP,
__IPV4_DEVCONF_MAX
 };
 
diff --git a/net/ipv4/arp.c b/net/ipv4/arp.c
index 59b3e0e8fd51..c102eb5ac55c 100644
--- a/net/ipv4/arp.c
+++ b/net/ipv4/arp.c
@@ -735,6 +735,14 @@ static int arp_process(struct net *net, struct sock *sk, 
struct sk_buff *skb)
(!IN_DEV_ROUTE_LOCALNET(in_dev) && ipv4_is_loopback(tip)))
goto out;
 
+ /*
+  *For some 802.11 wireless deployments (and possibly other networks),
+  *there will be an ARP proxy and gratuitous ARP frames are attacks
+  *and thus should not be accepted.
+  */
+   if (sip == tip && IN_DEV_ORCONF(in_dev, DROP_GRATUITOUS_ARP))
+   goto out;
+
 /*
  * Special case: We must set Frame Relay source Q.922 address
  */
diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
index dbbab28a52a4..3d835313575e 100644
--- a/net/ipv4/devinet.c
+++ b/net/ipv4/devinet.c
@@ -2185,6 +2185,8 @@ static struct devinet_sysctl_table {
"igmpv3_unsolicited_report_interval"),
DEVINET_SYSCTL_RW_ENTRY(IGNORE_ROUTES_WITH_LINKDOWN,
"ignore_routes_with_linkdown"),
+   DEVINET_SYSCTL_RW_ENTRY(DROP_GRATUITOUS_ARP,
+   "drop_gratuitous_arp"),
 
DEVINET_SYSCTL_FLUSHING_ENTRY(NOXFRM, "disable_xfrm"),
DEVINET_SYSCTL_FLUSHING_ENTRY(NOPOLICY, "disable_policy"),
-- 
2.7.0

Re: [net-next PATCH 7/7] net: ixgbe: add support for tc_u32 offload

2016-02-04 Thread Amir Vadai"

On Thu, Feb 04, 2016 at 12:23:02AM -0800, Fastabend, John R wrote:
> On 2/3/2016 11:30 PM, Amir Vadai" wrote:
> > On Wed, Feb 03, 2016 at 01:29:59AM -0800, John Fastabend wrote:
> >> This adds initial support for offloading the u32 tc classifier. This
> >> initial implementation only implements a few base matches and actions
> >> to illustrate the use of the infrastructure patches.
> >>
> >> However it is an interesting subset because it handles the u32 next
> >> hdr logic to correctly map tcp packets from ip headers using the ihl
> >> and protocol fields. After this is accepted we can extend the match
> >> and action fields easily by updating the model header file.
> >>
> >> Also only the drop action is supported initially.
> >>
> >> Here is a short test script,
> >>
> >>  #tc qdisc add dev eth4 ingress
> >>  #tc filter add dev eth4 parent : protocol ip \
> >>u32 ht 800: order 1 \
> >>match ip dst 15.0.0.1/32 match ip src 15.0.0.2/32 action drop
> >>
> >> <-- hardware has dst/src ip match rule installed -->
> >>
> >>  #tc filter del dev eth4 parent : prio 49152
> >>  #tc filter add dev eth4 parent : protocol ip prio 99 \
> >>handle 1: u32 divisor 1
> >>  #tc filter add dev eth4 protocol ip parent : prio 99 \
> >>u32 ht 800: order 1 link 1: \
> >>offset at 0 mask 0f00 shift 6 plus 0 eat match ip protocol 6 ff
> >>  #tc filter add dev eth4 parent : protocol ip \
> >>u32 ht 1: order 3 match tcp src 23  action drop
> >>
> >> <-- hardware has tcp src port rule installed -->
> >>
> >>  #tc qdisc del dev eth4 parent :
> >>
> >> <-- hardware cleaned up -->
> >>
> >> Signed-off-by: John Fastabend 
> >> ---
> >>  drivers/net/ethernet/intel/ixgbe/ixgbe.h |3 
> >>  drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c |6 -
> >>  drivers/net/ethernet/intel/ixgbe/ixgbe_main.c|  196 
> >> ++
> >>  3 files changed, 198 insertions(+), 7 deletions(-)
> >>
> >> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h 
> >> b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
> >> index 4b9156c..09c2d9b 100644
> >> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
> >> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
> > 
> > [...]
> > 
> >> @@ -8277,6 +8465,7 @@ static int ixgbe_set_features(struct net_device 
> >> *netdev,
> >> */
> >>switch (features & NETIF_F_NTUPLE) {
> >>case NETIF_F_NTUPLE:
> >> +  case NETIF_F_HW_TC:
> >>/* turn off ATR, enable perfect filters and reset */
> >>if (!(adapter->flags & IXGBE_FLAG_FDIR_PERFECT_CAPABLE))
> >>need_reset = true;
> > 
> > I think you have a bug here. I don't see how the NETIF_F_HW_TC case will
> > happen after masking 'features' out.
> > 
> 
> Ah I should have annotated this in the commit msg. I turn the feature
> off by default to enable it the user needs to run
> 
>   # ethtool -K ethx hw-tc-offload on
> 
> this is just a habit of mine to leave new features off by default for
> a bit until I work out some of the kinks. For example I found a case
> today where if you build loops into your u32 graph the hardware tables
> can get out of sync with the software tables. This is sort of extreme
> corner case not sure if anyone would really use u32 but it is valid
> and the hardware should abort correctly.
Yeh - that is nice :) But I was just pointing out on a small typo which I
think you have.
The new case will never happen. You compare: (features & NETIF_F_NTUPLE) == 
NETIF_F_HW_TC
Also the comment before the switch should be modified.

> 
> Thanks,
> John
>

Re: [PATCH net-next v5 1/2] ethtool: add speed/duplex validation functions

2016-02-04 Thread Michael S. Tsirkin

On Thu, Feb 04, 2016 at 10:32:26AM +1100, Stephen Hemminger wrote:
> On Wed,  3 Feb 2016 04:04:36 +0100
> Nikolay Aleksandrov  wrote:
> 
> >  
> > +static inline int ethtool_validate_speed(__u32 speed)
> > +{
> 
> 
> No need for inline.
> 
> But why check for valid value at all. At some point in the
> future, there will be yet another speed adopted by some standard body
> and the switch statement would need another value.
> 
> Why not accept any value? This is a virtual device.

It's virtual but often there's a physical backend behind it.  In the
future we will likely forward the values to and from that physical
device.  And if guest passes an unexpected value, host is unlikely to be
able to support it.

-- 
MST

[PATCH v2 1/4] lib: move strtobool to kstrtobool

2016-02-04 Thread Kees Cook

Create the kstrtobool_from_user helper and moves strtobool logic into
the new kstrtobool (matching all the other kstrto* functions). Provides
an inline wrapper for existing strtobool callers.

Signed-off-by: Kees Cook 
---
 include/linux/kernel.h |  3 +++
 include/linux/string.h |  6 +-
 lib/kstrtox.c  | 35 +++
 lib/string.c   | 29 -
 4 files changed, 43 insertions(+), 30 deletions(-)

diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index f31638c6e873..cdc25f47a23f 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -357,6 +357,7 @@ int __must_check kstrtou16(const char *s, unsigned int 
base, u16 *res);
 int __must_check kstrtos16(const char *s, unsigned int base, s16 *res);
 int __must_check kstrtou8(const char *s, unsigned int base, u8 *res);
 int __must_check kstrtos8(const char *s, unsigned int base, s8 *res);
+int __must_check kstrtobool(const char *s, unsigned int base, bool *res);
 
 int __must_check kstrtoull_from_user(const char __user *s, size_t count, 
unsigned int base, unsigned long long *res);
 int __must_check kstrtoll_from_user(const char __user *s, size_t count, 
unsigned int base, long long *res);
@@ -368,6 +369,8 @@ int __must_check kstrtou16_from_user(const char __user *s, 
size_t count, unsigne
 int __must_check kstrtos16_from_user(const char __user *s, size_t count, 
unsigned int base, s16 *res);
 int __must_check kstrtou8_from_user(const char __user *s, size_t count, 
unsigned int base, u8 *res);
 int __must_check kstrtos8_from_user(const char __user *s, size_t count, 
unsigned int base, s8 *res);
+int __must_check kstrtobool_from_user(const char __user *s, size_t count,
+ unsigned int base, bool *res);
 
 static inline int __must_check kstrtou64_from_user(const char __user *s, 
size_t count, unsigned int base, u64 *res)
 {
diff --git a/include/linux/string.h b/include/linux/string.h
index 9eebc66d957a..d2fb21b1081d 100644
--- a/include/linux/string.h
+++ b/include/linux/string.h
@@ -128,7 +128,11 @@ extern char **argv_split(gfp_t gfp, const char *str, int 
*argcp);
 extern void argv_free(char **argv);
 
 extern bool sysfs_streq(const char *s1, const char *s2);
-extern int strtobool(const char *s, bool *res);
+extern int kstrtobool(const char *s, unsigned int base, bool *res);
+static inline int strtobool(const char *s, bool *res)
+{
+   return kstrtobool(s, 0, res);
+}
 
 #ifdef CONFIG_BINARY_PRINTF
 int vbin_printf(u32 *bin_buf, size_t size, const char *fmt, va_list args);
diff --git a/lib/kstrtox.c b/lib/kstrtox.c
index 94be244e8441..e18f088704d7 100644
--- a/lib/kstrtox.c
+++ b/lib/kstrtox.c
@@ -321,6 +321,40 @@ int kstrtos8(const char *s, unsigned int base, s8 *res)
 }
 EXPORT_SYMBOL(kstrtos8);
 
+/**
+ * kstrtobool - convert common user inputs into boolean values
+ * @s: input string
+ * @base: ignored
+ * @res: result
+ *
+ * This routine returns 0 iff the first character is one of 'Yy1Nn0'.
+ * Otherwise it will return -EINVAL.  Value pointed to by res is
+ * updated upon finding a match.
+ */
+int kstrtobool(const char *s, unsigned int base, bool *res)
+{
+   if (!s)
+   return -EINVAL;
+
+   switch (s[0]) {
+   case 'y':
+   case 'Y':
+   case '1':
+   *res = true;
+   return 0;
+   case 'n':
+   case 'N':
+   case '0':
+   *res = false;
+   return 0;
+   default:
+   break;
+   }
+
+   return -EINVAL;
+}
+EXPORT_SYMBOL(kstrtobool);
+
 #define kstrto_from_user(f, g, type)   \
 int f(const char __user *s, size_t count, unsigned int base, type *res)
\
 {  \
@@ -345,3 +379,4 @@ kstrto_from_user(kstrtou16_from_user,   kstrtou16,  
u16);
 kstrto_from_user(kstrtos16_from_user,  kstrtos16,  s16);
 kstrto_from_user(kstrtou8_from_user,   kstrtou8,   u8);
 kstrto_from_user(kstrtos8_from_user,   kstrtos8,   s8);
+kstrto_from_user(kstrtobool_from_user, kstrtobool, bool);
diff --git a/lib/string.c b/lib/string.c
index 0323c0d5629a..1a90db9bc6e1 100644
--- a/lib/string.c
+++ b/lib/string.c
@@ -630,35 +630,6 @@ bool sysfs_streq(const char *s1, const char *s2)
 }
 EXPORT_SYMBOL(sysfs_streq);
 
-/**
- * strtobool - convert common user inputs into boolean values
- * @s: input string
- * @res: result
- *
- * This routine returns 0 iff the first character is one of 'Yy1Nn0'.
- * Otherwise it will return -EINVAL.  Value pointed to by res is
- * updated upon finding a match.
- */
-int strtobool(const char *s, bool *res)
-{
-   switch (s[0]) {
-   case 'y':
-   case 'Y':
-   case '1':
-   *res = true;
-   break;
-   case 'n':
-   case 'N':
-   case '0':
-   *res = false;
-   break;
-   default:
-

[PATCH v3 net 6/7] net: mvneta: The mvneta_percpu_elect function should be atomic

2016-02-04 Thread Gregory CLEMENT

Electing a CPU must be done in an atomic way: it should be done after or
before the removal/insertion of a CPU and this function is not reentrant.

During the loop of mvneta_percpu_elect we associates the queues to the
CPUs, if there is a topology change during this loop, then the mapping
between the CPUs and the queues could be wrong. During this loop the
interrupt mask is also updating for each CPUs, It should not be changed
in the same time by other part of the driver.

This patch adds spinlock to create the needed critical sections.

Signed-off-by: Gregory CLEMENT 
---
 drivers/net/ethernet/marvell/mvneta.c | 16 
 1 file changed, 16 insertions(+)

diff --git a/drivers/net/ethernet/marvell/mvneta.c 
b/drivers/net/ethernet/marvell/mvneta.c
index 2d0e8a605ca9..b12a745a0e4c 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -370,6 +370,10 @@ struct mvneta_port {
struct net_device *dev;
struct notifier_block cpu_notifier;
int rxq_def;
+   /* Protect the access to the percpu interrupt registers,
+* ensuring that the configuration remains coherent.
+*/
+   spinlock_t lock;
 
/* Core clock */
struct clk *clk;
@@ -2855,6 +2859,12 @@ static void mvneta_percpu_elect(struct mvneta_port *pp)
 {
int elected_cpu = 0, max_cpu, cpu, i = 0;
 
+   /* Electing a CPU must be done in an atomic way: it should be
+* done after or before the removal/insertion of a CPU and
+* this function is not reentrant.
+*/
+   spin_lock(>lock);
+
/* Use the cpu associated to the rxq when it is online, in all
 * the other cases, use the cpu 0 which can't be offline.
 */
@@ -2898,6 +2908,7 @@ static void mvneta_percpu_elect(struct mvneta_port *pp)
i++;
 
}
+   spin_unlock(>lock);
 };
 
 static int mvneta_percpu_notifier(struct notifier_block *nfb,
@@ -2952,8 +2963,13 @@ static int mvneta_percpu_notifier(struct notifier_block 
*nfb,
case CPU_DOWN_PREPARE:
case CPU_DOWN_PREPARE_FROZEN:
netif_tx_stop_all_queues(pp->dev);
+   /* Thanks to this lock we are sure that any pending
+* cpu election is done
+*/
+   spin_lock(>lock);
/* Mask all ethernet port interrupts */
on_each_cpu(mvneta_percpu_mask_interrupt, pp, true);
+   spin_unlock(>lock);
 
napi_synchronize(>napi);
napi_disable(>napi);
-- 
2.5.0

[PATCH v3 net 5/7] net: mvneta: Modify the queue related fields from each cpu

2016-02-04 Thread Gregory CLEMENT

In the MVNETA_INTR_* registers, the queues related fields are per cpu,
according to the datasheet (comment in [] are added by me):
"In a multi-CPU system, bits of RX[or TX] queues for which the access by
the reading[or writing] CPU is disabled are read as 0, and cannot be
cleared[or written]."

That means that each time we want to manipulate these bits we had to do
it on each cpu and not only on the current cpu.

Signed-off-by: Gregory CLEMENT 
---
 drivers/net/ethernet/marvell/mvneta.c | 100 --
 1 file changed, 46 insertions(+), 54 deletions(-)

diff --git a/drivers/net/ethernet/marvell/mvneta.c 
b/drivers/net/ethernet/marvell/mvneta.c
index 74f8158df2b0..2d0e8a605ca9 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -1038,6 +1038,43 @@ static void mvneta_set_autoneg(struct mvneta_port *pp, 
int enable)
}
 }
 
+static void mvneta_percpu_unmask_interrupt(void *arg)
+{
+   struct mvneta_port *pp = arg;
+
+   /* All the queue are unmasked, but actually only the ones
+* mapped to this CPU will be unmasked
+*/
+   mvreg_write(pp, MVNETA_INTR_NEW_MASK,
+   MVNETA_RX_INTR_MASK_ALL |
+   MVNETA_TX_INTR_MASK_ALL |
+   MVNETA_MISCINTR_INTR_MASK);
+}
+
+static void mvneta_percpu_mask_interrupt(void *arg)
+{
+   struct mvneta_port *pp = arg;
+
+   /* All the queue are masked, but actually only the ones
+* mapped to this CPU will be masked
+*/
+   mvreg_write(pp, MVNETA_INTR_NEW_MASK, 0);
+   mvreg_write(pp, MVNETA_INTR_OLD_MASK, 0);
+   mvreg_write(pp, MVNETA_INTR_MISC_MASK, 0);
+}
+
+static void mvneta_percpu_clear_intr_cause(void *arg)
+{
+   struct mvneta_port *pp = arg;
+
+   /* All the queue are cleared, but actually only the ones
+* mapped to this CPU will be cleared
+*/
+   mvreg_write(pp, MVNETA_INTR_NEW_CAUSE, 0);
+   mvreg_write(pp, MVNETA_INTR_MISC_CAUSE, 0);
+   mvreg_write(pp, MVNETA_INTR_OLD_CAUSE, 0);
+}
+
 /* This method sets defaults to the NETA port:
  * Clears interrupt Cause and Mask registers.
  * Clears all MAC tables.
@@ -1055,14 +1092,10 @@ static void mvneta_defaults_set(struct mvneta_port *pp)
int max_cpu = num_present_cpus();
 
/* Clear all Cause registers */
-   mvreg_write(pp, MVNETA_INTR_NEW_CAUSE, 0);
-   mvreg_write(pp, MVNETA_INTR_OLD_CAUSE, 0);
-   mvreg_write(pp, MVNETA_INTR_MISC_CAUSE, 0);
+   on_each_cpu(mvneta_percpu_clear_intr_cause, pp, true);
 
/* Mask all interrupts */
-   mvreg_write(pp, MVNETA_INTR_NEW_MASK, 0);
-   mvreg_write(pp, MVNETA_INTR_OLD_MASK, 0);
-   mvreg_write(pp, MVNETA_INTR_MISC_MASK, 0);
+   on_each_cpu(mvneta_percpu_mask_interrupt, pp, true);
mvreg_write(pp, MVNETA_INTR_ENABLE, 0);
 
/* Enable MBUS Retry bit16 */
@@ -2528,31 +2561,6 @@ static int mvneta_setup_txqs(struct mvneta_port *pp)
return 0;
 }
 
-static void mvneta_percpu_unmask_interrupt(void *arg)
-{
-   struct mvneta_port *pp = arg;
-
-   /* All the queue are unmasked, but actually only the ones
-* maped to this CPU will be unmasked
-*/
-   mvreg_write(pp, MVNETA_INTR_NEW_MASK,
-   MVNETA_RX_INTR_MASK_ALL |
-   MVNETA_TX_INTR_MASK_ALL |
-   MVNETA_MISCINTR_INTR_MASK);
-}
-
-static void mvneta_percpu_mask_interrupt(void *arg)
-{
-   struct mvneta_port *pp = arg;
-
-   /* All the queue are masked, but actually only the ones
-* maped to this CPU will be masked
-*/
-   mvreg_write(pp, MVNETA_INTR_NEW_MASK, 0);
-   mvreg_write(pp, MVNETA_INTR_OLD_MASK, 0);
-   mvreg_write(pp, MVNETA_INTR_MISC_MASK, 0);
-}
-
 static void mvneta_start_dev(struct mvneta_port *pp)
 {
int cpu;
@@ -2603,13 +2611,10 @@ static void mvneta_stop_dev(struct mvneta_port *pp)
mvneta_port_disable(pp);
 
/* Clear all ethernet port interrupts */
-   mvreg_write(pp, MVNETA_INTR_MISC_CAUSE, 0);
-   mvreg_write(pp, MVNETA_INTR_OLD_CAUSE, 0);
+   on_each_cpu(mvneta_percpu_clear_intr_cause, pp, true);
 
/* Mask all ethernet port interrupts */
-   mvreg_write(pp, MVNETA_INTR_NEW_MASK, 0);
-   mvreg_write(pp, MVNETA_INTR_OLD_MASK, 0);
-   mvreg_write(pp, MVNETA_INTR_MISC_MASK, 0);
+   on_each_cpu(mvneta_percpu_mask_interrupt, pp, true);
 
mvneta_tx_reset(pp);
mvneta_rx_reset(pp);
@@ -2921,9 +2926,7 @@ static int mvneta_percpu_notifier(struct notifier_block 
*nfb,
}
 
/* Mask all ethernet port interrupts */
-   mvreg_write(pp, MVNETA_INTR_NEW_MASK, 0);
-   mvreg_write(pp, MVNETA_INTR_OLD_MASK, 0);
-   mvreg_write(pp, MVNETA_INTR_MISC_MASK, 0);
+   on_each_cpu(mvneta_percpu_mask_interrupt, pp, true);

[PATCH V2 net-next] hv_netvsc: cleanup netdev feature flags for netvsc

2016-02-04 Thread Simon Xiao

1. Adding NETIF_F_TSO6 feature flag;
2. Adding NETIF_F_HW_CSUM. NETIF_F_IPV6_CSUM and NETIF_F_IP_CSUM are 
being deprecated;
3. Cleanup the coding style of flag assignment by using macro.

Signed-off-by: Simon Xiao 
Reviewed-by: K. Y. Srinivasan 
Reviewed-by: Haiyang Zhang 
---
 drivers/net/hyperv/netvsc_drv.c | 11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index 1d3a665..c72e5b8 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -43,6 +43,11 @@
 
 #define RING_SIZE_MIN 64
 #define LINKCHANGE_INT (2 * HZ)
+#define NETVSC_HW_FEATURES (NETIF_F_RXCSUM | \
+NETIF_F_SG | \
+NETIF_F_TSO | \
+NETIF_F_TSO6 | \
+NETIF_F_HW_CSUM)
 static int ring_size = 128;
 module_param(ring_size, int, S_IRUGO);
 MODULE_PARM_DESC(ring_size, "Ring buffer size (# of pages)");
@@ -1081,10 +1086,8 @@ static int netvsc_probe(struct hv_device *dev,
 
net->netdev_ops = _ops;
 
-   net->hw_features = NETIF_F_RXCSUM | NETIF_F_SG | NETIF_F_IP_CSUM |
-   NETIF_F_TSO;
-   net->features = NETIF_F_HW_VLAN_CTAG_TX | NETIF_F_SG | NETIF_F_RXCSUM |
-   NETIF_F_IP_CSUM | NETIF_F_TSO;
+   net->hw_features = NETVSC_HW_FEATURES;
+   net->features = NETVSC_HW_FEATURES | NETIF_F_HW_VLAN_CTAG_TX;
 
net->ethtool_ops = _ops;
SET_NETDEV_DEV(net, >device);
-- 
2.5.0

[PATCH v2 4/4] param: convert some "on"/"off" users to strtobool

2016-02-04 Thread Kees Cook

This changes several users of manual "on"/"off" parsing to use strtobool.
(Which means they will now parse y/n/1/0 meaningfully too.)

Signed-off-by: Kees Cook 
Acked-by: Heiko Carstens 
Acked-by: Michael Ellerman 
Cc: x...@kernel.org
Cc: linuxppc-...@lists.ozlabs.org
Cc: linux-s...@vger.kernel.org
---
 arch/powerpc/kernel/rtasd.c  |  9 ++---
 arch/powerpc/platforms/pseries/hotplug-cpu.c | 10 ++
 arch/s390/kernel/time.c  |  8 ++--
 arch/s390/kernel/topology.c  |  7 ++-
 arch/x86/kernel/aperture_64.c| 12 ++--
 include/linux/tick.h |  2 +-
 kernel/time/hrtimer.c| 10 ++
 kernel/time/tick-sched.c | 10 ++
 8 files changed, 15 insertions(+), 53 deletions(-)

diff --git a/arch/powerpc/kernel/rtasd.c b/arch/powerpc/kernel/rtasd.c
index 5a2c049c1c61..567ed5a2f43a 100644
--- a/arch/powerpc/kernel/rtasd.c
+++ b/arch/powerpc/kernel/rtasd.c
@@ -49,7 +49,7 @@ static unsigned int rtas_error_log_buffer_max;
 static unsigned int event_scan;
 static unsigned int rtas_event_scan_rate;
 
-static int full_rtas_msgs = 0;
+static bool full_rtas_msgs;
 
 /* Stop logging to nvram after first fatal error */
 static int logging_enabled; /* Until we initialize everything,
@@ -592,11 +592,6 @@ __setup("surveillance=", surveillance_setup);
 
 static int __init rtasmsgs_setup(char *str)
 {
-   if (strcmp(str, "on") == 0)
-   full_rtas_msgs = 1;
-   else if (strcmp(str, "off") == 0)
-   full_rtas_msgs = 0;
-
-   return 1;
+   return kstrtobool(str, 0, _rtas_msgs);
 }
 __setup("rtasmsgs=", rtasmsgs_setup);
diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c 
b/arch/powerpc/platforms/pseries/hotplug-cpu.c
index 32274f72fe3f..b9787cae4108 100644
--- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
+++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
@@ -47,20 +47,14 @@ static DEFINE_PER_CPU(enum cpu_state_vals, current_state) = 
CPU_STATE_OFFLINE;
 
 static enum cpu_state_vals default_offline_state = CPU_STATE_OFFLINE;
 
-static int cede_offline_enabled __read_mostly = 1;
+static bool cede_offline_enabled __read_mostly = true;
 
 /*
  * Enable/disable cede_offline when available.
  */
 static int __init setup_cede_offline(char *str)
 {
-   if (!strcmp(str, "off"))
-   cede_offline_enabled = 0;
-   else if (!strcmp(str, "on"))
-   cede_offline_enabled = 1;
-   else
-   return 0;
-   return 1;
+   return kstrtobool(str, 0, _offline_enabled);
 }
 
 __setup("cede_offline=", setup_cede_offline);
diff --git a/arch/s390/kernel/time.c b/arch/s390/kernel/time.c
index 99f84ac31307..dff6ce1b84b2 100644
--- a/arch/s390/kernel/time.c
+++ b/arch/s390/kernel/time.c
@@ -1433,7 +1433,7 @@ device_initcall(etr_init_sysfs);
 /*
  * Server Time Protocol (STP) code.
  */
-static int stp_online;
+static bool stp_online;
 static struct stp_sstpi stp_info;
 static void *stp_page;
 
@@ -1444,11 +1444,7 @@ static struct timer_list stp_timer;
 
 static int __init early_parse_stp(char *p)
 {
-   if (strncmp(p, "off", 3) == 0)
-   stp_online = 0;
-   else if (strncmp(p, "on", 2) == 0)
-   stp_online = 1;
-   return 0;
+   return kstrtobool(p, 0, _online);
 }
 early_param("stp", early_parse_stp);
 
diff --git a/arch/s390/kernel/topology.c b/arch/s390/kernel/topology.c
index 40b8102fdadb..5d8a80651f61 100644
--- a/arch/s390/kernel/topology.c
+++ b/arch/s390/kernel/topology.c
@@ -37,7 +37,7 @@ static void set_topology_timer(void);
 static void topology_work_fn(struct work_struct *work);
 static struct sysinfo_15_1_x *tl_info;
 
-static int topology_enabled = 1;
+static bool topology_enabled = true;
 static DECLARE_WORK(topology_work, topology_work_fn);
 
 /*
@@ -444,10 +444,7 @@ static const struct cpumask *cpu_book_mask(int cpu)
 
 static int __init early_parse_topology(char *p)
 {
-   if (strncmp(p, "off", 3))
-   return 0;
-   topology_enabled = 0;
-   return 0;
+   return kstrtobool(p, 0, _enabled);
 }
 early_param("topology", early_parse_topology);
 
diff --git a/arch/x86/kernel/aperture_64.c b/arch/x86/kernel/aperture_64.c
index 6e85f713641d..6b423754083a 100644
--- a/arch/x86/kernel/aperture_64.c
+++ b/arch/x86/kernel/aperture_64.c
@@ -227,19 +227,11 @@ static u32 __init search_agp_bridge(u32 *order, int 
*valid_agp)
return 0;
 }
 
-static int gart_fix_e820 __initdata = 1;
+static bool gart_fix_e820 __initdata = true;
 
 static int __init parse_gart_mem(char *p)
 {
-   if (!p)
-   return -EINVAL;
-
-   if (!strncmp(p, "off", 3))
-   gart_fix_e820 = 0;
-   else if (!strncmp(p, "on", 2))
-   gart_fix_e820 = 1;
-
-   return 0;
+   return kstrtobool(p, 0, _fix_e820);
 }

[PATCH v2 3/4] lib: add "on"/"off" support to kstrtobool

2016-02-04 Thread Kees Cook

Add support for "on" and "off" when converting to boolean.

Signed-off-by: Kees Cook 
---
 lib/kstrtox.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/lib/kstrtox.c b/lib/kstrtox.c
index e18f088704d7..09e83a19a96d 100644
--- a/lib/kstrtox.c
+++ b/lib/kstrtox.c
@@ -347,6 +347,20 @@ int kstrtobool(const char *s, unsigned int base, bool *res)
case '0':
*res = false;
return 0;
+   case 'o':
+   case 'O':
+   switch (s[1]) {
+   case 'n':
+   case 'N':
+   *res = true;
+   return 0;
+   case 'f':
+   case 'F':
+   *res = false;
+   return 0;
+   default:
+   break;
+   }
default:
break;
}
-- 
2.6.3

[PATCH v2 2/4] lib: update single-char callers of strtobool

2016-02-04 Thread Kees Cook

Some callers of strtobool were passing a pointer to unterminated strings.
In preparation of adding multi-character processing to kstrtobool, update
the callers to not pass single-character pointers, and switch to using the
new kstrtobool_from_user helper where possible.

Signed-off-by: Kees Cook 
Cc: Amitkumar Karwar 
Cc: Nishant Sarmukadam 
Cc: Kalle Valo 
Cc: Steve French 
Cc: linux-c...@vger.kernel.org
---
 drivers/net/wireless/marvell/mwifiex/debugfs.c | 10 ++---
 fs/cifs/cifs_debug.c   | 58 +++---
 fs/cifs/cifs_debug.h   |  2 +-
 fs/cifs/cifsfs.c   |  6 +--
 fs/cifs/cifsglob.h |  4 +-
 5 files changed, 26 insertions(+), 54 deletions(-)

diff --git a/drivers/net/wireless/marvell/mwifiex/debugfs.c 
b/drivers/net/wireless/marvell/mwifiex/debugfs.c
index 0b9c580af988..bd061b02bc04 100644
--- a/drivers/net/wireless/marvell/mwifiex/debugfs.c
+++ b/drivers/net/wireless/marvell/mwifiex/debugfs.c
@@ -880,14 +880,12 @@ mwifiex_reset_write(struct file *file,
 {
struct mwifiex_private *priv = file->private_data;
struct mwifiex_adapter *adapter = priv->adapter;
-   char cmd;
bool result;
+   int rc;
 
-   if (copy_from_user(, ubuf, sizeof(cmd)))
-   return -EFAULT;
-
-   if (strtobool(, ))
-   return -EINVAL;
+   rc = kstrtobool_from_user(ubuf, count, 0, );
+   if (rc)
+   return rc;
 
if (!result)
return -EINVAL;
diff --git a/fs/cifs/cifs_debug.c b/fs/cifs/cifs_debug.c
index 50b268483302..6ee59abcb69b 100644
--- a/fs/cifs/cifs_debug.c
+++ b/fs/cifs/cifs_debug.c
@@ -255,7 +255,6 @@ static const struct file_operations 
cifs_debug_data_proc_fops = {
 static ssize_t cifs_stats_proc_write(struct file *file,
const char __user *buffer, size_t count, loff_t *ppos)
 {
-   char c;
bool bv;
int rc;
struct list_head *tmp1, *tmp2, *tmp3;
@@ -263,11 +262,8 @@ static ssize_t cifs_stats_proc_write(struct file *file,
struct cifs_ses *ses;
struct cifs_tcon *tcon;
 
-   rc = get_user(c, buffer);
-   if (rc)
-   return rc;
-
-   if (strtobool(, ) == 0) {
+   rc = kstrtobool_from_user(buffer, count, 0, );
+   if (rc == 0) {
 #ifdef CONFIG_CIFS_STATS2
atomic_set(, 0);
atomic_set(, 0);
@@ -290,6 +286,8 @@ static ssize_t cifs_stats_proc_write(struct file *file,
}
}
spin_unlock(_tcp_ses_lock);
+   } else {
+   return rc;
}
 
return count;
@@ -433,17 +431,17 @@ static int cifsFYI_proc_open(struct inode *inode, struct 
file *file)
 static ssize_t cifsFYI_proc_write(struct file *file, const char __user *buffer,
size_t count, loff_t *ppos)
 {
-   char c;
+   char c[2] = { '\0' };
bool bv;
int rc;
 
-   rc = get_user(c, buffer);
+   rc = get_user(c[0], buffer);
if (rc)
return rc;
-   if (strtobool(, ) == 0)
+   if (strtobool(c, ) == 0)
cifsFYI = bv;
-   else if ((c > '1') && (c <= '9'))
-   cifsFYI = (int) (c - '0'); /* see cifs_debug.h for meanings */
+   else if ((c[0] > '1') && (c[0] <= '9'))
+   cifsFYI = (int) (c[0] - '0'); /* see cifs_debug.h for meanings 
*/
 
return count;
 }
@@ -471,20 +469,12 @@ static int cifs_linux_ext_proc_open(struct inode *inode, 
struct file *file)
 static ssize_t cifs_linux_ext_proc_write(struct file *file,
const char __user *buffer, size_t count, loff_t *ppos)
 {
-   char c;
-   bool bv;
int rc;
 
-   rc = get_user(c, buffer);
+   rc = kstrtobool_from_user(buffer, count, 0, );
if (rc)
return rc;
 
-   rc = strtobool(, );
-   if (rc)
-   return rc;
-
-   linuxExtEnabled = bv;
-
return count;
 }
 
@@ -511,20 +501,12 @@ static int cifs_lookup_cache_proc_open(struct inode 
*inode, struct file *file)
 static ssize_t cifs_lookup_cache_proc_write(struct file *file,
const char __user *buffer, size_t count, loff_t *ppos)
 {
-   char c;
-   bool bv;
int rc;
 
-   rc = get_user(c, buffer);
+   rc = kstrtobool_from_user(buffer, count, 0, );
if (rc)
return rc;
 
-   rc = strtobool(, );
-   if (rc)
-   return rc;
-
-   lookupCacheEnabled = bv;
-
return count;
 }
 
@@ -551,20 +533,12 @@ static int traceSMB_proc_open(struct inode *inode, struct 
file *file)
 static ssize_t traceSMB_proc_write(struct file *file, const char __user 
*buffer,
size_t count, loff_t *ppos)
 {
-   char c;
-   bool bv;
int rc;
 
-   rc = get_user(c,

Re: [PATCH v3 net-next] net: Implement fast csum_partial for x86_64

2016-02-04 Thread Linus Torvalds

I missed the original email (I don't have net-devel in my mailbox),
but based on Ingo's quoting have a more fundamental question:

Why wasn't that done with C code instead of asm with odd numerical targets?

It seems likely that the real issue is avoiding the short loops (that
will cause branch prediction problems) and use a lookup table instead.

But we can probably do better than that asm.

For example, for the actual "8 bytes or shorter" case, I think
something like this might just work fine:

  unsigned long csum_partial_8orless(const unsigned char *buf,
unsigned long len, unsigned long sum)
  {
static const unsigned long mask[9] = {
0x,
0xff00,
0x,
0xff00,
0x,
0xff00,
0x,
0xff00,
0x };
unsigned long val = load_unaligned_zeropad(buf + (len & 1));
val &= mask[len];
asm("addq %1,%0 ; adcq $0,%0":"=r" (sum):"r" (val), "0" (sum));
return sum;
  }

NOTE! The above is 100% untested. But I _think_ it may handle the
odd-byte-case correctly, and it should result in just one 8-byte load
(the "load_unaligned_zeropad()" is just in case that ends up
overflowing and we have page-alloc-debug triggering a page fault at
the end). All without looping or any conditional branches that might
mispredict.

My point is that going to assembly results in pretty hard-to-read
code, but it's also fairly inflexible. If we stay with C, we can much
more easily play tricks. So for example, make the above an inline
function, and now you can do things like this:

  static inline unsigned long csum_partial_64bit(void *buf, unsigned
long len, unsigned long sum)
  {
if (len <= 8)
return csum_partial_8orless(buf, len, sum);

/* We know it's larger than 8 bytes, so handle alignment */
align = 7 & -(unsigned long)buf;
sum = csum_partial_8orless(buf, align, sum);
buf += align;

/* We need to do the big-endian thing */
sum = rotate_by8_if_odd(sum, align);

/* main loop for big packets */
.. do the unrolled inline asm thing that we already do ..

sum = rotate_by8_if_odd(sum, align);

/* Handle the last bytes */
return csum_partial_8orless(buf, len, sum);
  }

  /* Fold the 64-bit sum we computed down to 32 bits __wsum */
  __wsum int csum_partial(void *buf, unsigned int len, __wsum partial)
  {
unsigned long sum = csum_partial_64bit(ptr, len, partial);
asm("addl %1,%0 ; adcl $0,%0":"=r" (sum):"r" (sum >> 32), "0" (sum));
return sum;
 }

or something like that.

NOTE NOTE NOTE! I did a half-arsed attempt at getting the whole
"big-endian 16-bit add" thing right by doing the odd byte masking in
the end cases, and by rotating the sum by 8 bits around the
8-byte-unrolled-loop, but I didn't test the above. It's literally
written in my mail client. So I can almost guarantee that it is buggy,
but it is meant as an *example* of "why not do it this way" rather
than production code.

I think that writing it in C and trying to be intelligent about it
like the above would result in more maintainable code, and it is
possible that it would even be faster.

Yes, writing it in C *does* likely result in a few more cases of "adcq
$0" in order to finish up the carry calculations. The *only* advantage
of inline asm is how it allows you to keep the carry flag around. So
there is downside to the C model, and it might cause a cycle or two of
extra work, but the upside of C is that you can try to do clever
things without turning the code completely unreadable.

For example, doing the exception handling (that will never actually
trigger) for the "let's just do a 8-byte load" is just painful in
assembly. In C, we already have the helper function to do it.

Hmm? Would somebody be willing to take the likely very buggy code
above, and test it and make it work? I assume there's a test harness
for the whole csum thing somewhere.

 Linus

Re: [PATCH v2 1/4] lib: move strtobool to kstrtobool

2016-02-04 Thread Kees Cook

On Thu, Feb 4, 2016 at 2:43 PM, Andy Shevchenko
 wrote:
> On Thu, Feb 4, 2016 at 11:00 PM, Kees Cook  wrote:
>> Create the kstrtobool_from_user helper and moves strtobool logic into
>> the new kstrtobool (matching all the other kstrto* functions). Provides
>> an inline wrapper for existing strtobool callers.
>>
>> Signed-off-by: Kees Cook 
>
> Reviewed-by: Andy Shevchenko 
>
> One minor below.

Thanks!

>
>> ---
>>  include/linux/kernel.h |  3 +++
>>  include/linux/string.h |  6 +-
>>  lib/kstrtox.c  | 35 +++
>>  lib/string.c   | 29 -
>>  4 files changed, 43 insertions(+), 30 deletions(-)
>>
>> diff --git a/include/linux/kernel.h b/include/linux/kernel.h
>> index f31638c6e873..cdc25f47a23f 100644
>> --- a/include/linux/kernel.h
>> +++ b/include/linux/kernel.h
>> @@ -357,6 +357,7 @@ int __must_check kstrtou16(const char *s, unsigned int 
>> base, u16 *res);
>>  int __must_check kstrtos16(const char *s, unsigned int base, s16 *res);
>>  int __must_check kstrtou8(const char *s, unsigned int base, u8 *res);
>>  int __must_check kstrtos8(const char *s, unsigned int base, s8 *res);
>> +int __must_check kstrtobool(const char *s, unsigned int base, bool *res);
>>
>>  int __must_check kstrtoull_from_user(const char __user *s, size_t count, 
>> unsigned int base, unsigned long long *res);
>>  int __must_check kstrtoll_from_user(const char __user *s, size_t count, 
>> unsigned int base, long long *res);
>> @@ -368,6 +369,8 @@ int __must_check kstrtou16_from_user(const char __user 
>> *s, size_t count, unsigne
>>  int __must_check kstrtos16_from_user(const char __user *s, size_t count, 
>> unsigned int base, s16 *res);
>>  int __must_check kstrtou8_from_user(const char __user *s, size_t count, 
>> unsigned int base, u8 *res);
>>  int __must_check kstrtos8_from_user(const char __user *s, size_t count, 
>> unsigned int base, s8 *res);
>
>> +int __must_check kstrtobool_from_user(const char __user *s, size_t count,
>> + unsigned int base, bool *res);
>
> We already are using long lines here, perhaps do the same?

I went back and forth on that, and decided that between checkpatch
yelling at me, and trying to be an agent of less entropy, I wrapped
the definition. I am fine either way, though.

-Kees

>
>>
>>  static inline int __must_check kstrtou64_from_user(const char __user *s, 
>> size_t count, unsigned int base, u64 *res)
>>  {
>> diff --git a/include/linux/string.h b/include/linux/string.h
>> index 9eebc66d957a..d2fb21b1081d 100644
>> --- a/include/linux/string.h
>> +++ b/include/linux/string.h
>> @@ -128,7 +128,11 @@ extern char **argv_split(gfp_t gfp, const char *str, 
>> int *argcp);
>>  extern void argv_free(char **argv);
>>
>>  extern bool sysfs_streq(const char *s1, const char *s2);
>> -extern int strtobool(const char *s, bool *res);
>> +extern int kstrtobool(const char *s, unsigned int base, bool *res);
>> +static inline int strtobool(const char *s, bool *res)
>> +{
>> +   return kstrtobool(s, 0, res);
>> +}
>>
>>  #ifdef CONFIG_BINARY_PRINTF
>>  int vbin_printf(u32 *bin_buf, size_t size, const char *fmt, va_list args);
>> diff --git a/lib/kstrtox.c b/lib/kstrtox.c
>> index 94be244e8441..e18f088704d7 100644
>> --- a/lib/kstrtox.c
>> +++ b/lib/kstrtox.c
>> @@ -321,6 +321,40 @@ int kstrtos8(const char *s, unsigned int base, s8 *res)
>>  }
>>  EXPORT_SYMBOL(kstrtos8);
>>
>> +/**
>> + * kstrtobool - convert common user inputs into boolean values
>> + * @s: input string
>> + * @base: ignored
>> + * @res: result
>> + *
>> + * This routine returns 0 iff the first character is one of 'Yy1Nn0'.
>> + * Otherwise it will return -EINVAL.  Value pointed to by res is
>> + * updated upon finding a match.
>> + */
>> +int kstrtobool(const char *s, unsigned int base, bool *res)
>> +{
>> +   if (!s)
>> +   return -EINVAL;
>> +
>> +   switch (s[0]) {
>> +   case 'y':
>> +   case 'Y':
>> +   case '1':
>> +   *res = true;
>> +   return 0;
>> +   case 'n':
>> +   case 'N':
>> +   case '0':
>> +   *res = false;
>> +   return 0;
>> +   default:
>> +   break;
>> +   }
>> +
>> +   return -EINVAL;
>> +}
>> +EXPORT_SYMBOL(kstrtobool);
>> +
>>  #define kstrto_from_user(f, g, type)   \
>>  int f(const char __user *s, size_t count, unsigned int base, type *res) 
>>\
>>  {  \
>> @@ -345,3 +379,4 @@ kstrto_from_user(kstrtou16_from_user,   kstrtou16,   
>>u16);
>>  kstrto_from_user(kstrtos16_from_user,  kstrtos16,  s16);
>>  kstrto_from_user(kstrtou8_from_user,   kstrtou8,   u8);
>>  kstrto_from_user(kstrtos8_from_user,   kstrtos8,   s8);
>> +kstrto_from_user(kstrtobool_from_user,

Re: [PATCH v3 net-next] net: Implement fast csum_partial for x86_64

2016-02-04 Thread Linus Torvalds

On Thu, Feb 4, 2016 at 2:43 PM, Tom Herbert  wrote:
>
> The reason I did this in assembly is precisely about the your point of
> having to close the carry chains with adcq $0. I do have a first
> implementation in C which using switch() to handle alignment, excess
> length less than 8 bytes, and the odd number of quads to sum in the
> main loop. gcc turns these switch statements into jump tables (not
> function tables which is what Ingo's example code was using). The
> problem I hit was that for each case I needed to close the carry chain
> in the inline asm so fall through wouldn't have much value and each
> case is expanded. The C version using switch gave a nice performance
> gain, moving to all assembly was somewhat better.

Yeah,. so I _think_ that if my approach works, the code generated for
the 0-8 byte case looks just something like

movq %rsi, %rax
andl $1, %eax
subq %rax, %rdi
movq %rdx, %rax
movq (%rdi), %rcx
andq mask(,%rsi,8), %rcx
addq %rcx,%rax
adcq $0,%rax

which is pretty close to optimal (that's with my hopefully fixed version).

That's not with the final narrowing from 64-bit to 32-bit, but it
looks like it should perform well on most x86 cores, and then the only
remaining branches end up being the actual size checking ones.

And yes, you could avoid a few "adc $0" instructions in asm, but quite
frankly, I think that's a tradeoff that is worth it.

My guess is that you can get to within a percent of optimal with the C
approach, and I really think it makes it easier to try different
things (ie things like the above that avoids the switch table
entirely)

> There is also question of alignment. I f we really don't need to worry
> about alignment at all on x86, then we should be able to eliminate the
> complexity of dealing with it.

So most x86 alignment issues are about crossing the cache bank size,
which is usually 16 or 32 bytes. Unaligned accesses *within* one of
those banks should be basically free (there's a whoppign big byte lane
shifter, so there's lots of hardware support for that).

Also, even when you do cross a cache bank, it's usually pretty cheap.
It extra cycles, but it's generally *less* extra cycles than it would
be to try to align things in software and doing two accesses.

The rule of thumb is that you should never worry about _individual_
unaligned accesses. It's only really worth it aligning things before
biggish loops. So aligning to an 8-byte boundary before you then do a
lot of "adcq" instructions makes sense, but worrying about unaligned
accesses for the beginning/end does generally not.

>>> For example, for the actual "8 bytes or shorter" case, I think
>> something like this might just work fine: [ snip ]
>
> I will look at doing that.

So I already found and fixed one bug in that approach, but I still
think it's a viable model.

But you will almost certainly have to fix a few more of my bugs before
it really works ;)

   Linus

Re: [PATCH v2 4/4] param: convert some "on"/"off" users to strtobool

2016-02-04 Thread Andy Shevchenko

On Thu, Feb 4, 2016 at 11:00 PM, Kees Cook  wrote:
> This changes several users of manual "on"/"off" parsing to use strtobool.
> (Which means they will now parse y/n/1/0 meaningfully too.)
>

I like this change, but can you carefully check the acceptance of the
returned value?
Briefly I saw 1 or 0 as okay in different places.


> Signed-off-by: Kees Cook 
> Acked-by: Heiko Carstens 
> Acked-by: Michael Ellerman 
> Cc: x...@kernel.org
> Cc: linuxppc-...@lists.ozlabs.org
> Cc: linux-s...@vger.kernel.org
> ---
>  arch/powerpc/kernel/rtasd.c  |  9 ++---
>  arch/powerpc/platforms/pseries/hotplug-cpu.c | 10 ++
>  arch/s390/kernel/time.c  |  8 ++--
>  arch/s390/kernel/topology.c  |  7 ++-
>  arch/x86/kernel/aperture_64.c| 12 ++--
>  include/linux/tick.h |  2 +-
>  kernel/time/hrtimer.c| 10 ++
>  kernel/time/tick-sched.c | 10 ++
>  8 files changed, 15 insertions(+), 53 deletions(-)
>
> diff --git a/arch/powerpc/kernel/rtasd.c b/arch/powerpc/kernel/rtasd.c
> index 5a2c049c1c61..567ed5a2f43a 100644
> --- a/arch/powerpc/kernel/rtasd.c
> +++ b/arch/powerpc/kernel/rtasd.c
> @@ -49,7 +49,7 @@ static unsigned int rtas_error_log_buffer_max;
>  static unsigned int event_scan;
>  static unsigned int rtas_event_scan_rate;
>
> -static int full_rtas_msgs = 0;
> +static bool full_rtas_msgs;
>
>  /* Stop logging to nvram after first fatal error */
>  static int logging_enabled; /* Until we initialize everything,
> @@ -592,11 +592,6 @@ __setup("surveillance=", surveillance_setup);
>
>  static int __init rtasmsgs_setup(char *str)
>  {
> -   if (strcmp(str, "on") == 0)
> -   full_rtas_msgs = 1;
> -   else if (strcmp(str, "off") == 0)
> -   full_rtas_msgs = 0;
> -
> -   return 1;
> +   return kstrtobool(str, 0, _rtas_msgs);
>  }
>  __setup("rtasmsgs=", rtasmsgs_setup);
> diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c 
> b/arch/powerpc/platforms/pseries/hotplug-cpu.c
> index 32274f72fe3f..b9787cae4108 100644
> --- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
> +++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
> @@ -47,20 +47,14 @@ static DEFINE_PER_CPU(enum cpu_state_vals, current_state) 
> = CPU_STATE_OFFLINE;
>
>  static enum cpu_state_vals default_offline_state = CPU_STATE_OFFLINE;
>
> -static int cede_offline_enabled __read_mostly = 1;
> +static bool cede_offline_enabled __read_mostly = true;
>
>  /*
>   * Enable/disable cede_offline when available.
>   */
>  static int __init setup_cede_offline(char *str)
>  {
> -   if (!strcmp(str, "off"))
> -   cede_offline_enabled = 0;
> -   else if (!strcmp(str, "on"))
> -   cede_offline_enabled = 1;
> -   else
> -   return 0;
> -   return 1;
> +   return kstrtobool(str, 0, _offline_enabled);
>  }
>
>  __setup("cede_offline=", setup_cede_offline);
> diff --git a/arch/s390/kernel/time.c b/arch/s390/kernel/time.c
> index 99f84ac31307..dff6ce1b84b2 100644
> --- a/arch/s390/kernel/time.c
> +++ b/arch/s390/kernel/time.c
> @@ -1433,7 +1433,7 @@ device_initcall(etr_init_sysfs);
>  /*
>   * Server Time Protocol (STP) code.
>   */
> -static int stp_online;
> +static bool stp_online;
>  static struct stp_sstpi stp_info;
>  static void *stp_page;
>
> @@ -1444,11 +1444,7 @@ static struct timer_list stp_timer;
>
>  static int __init early_parse_stp(char *p)
>  {
> -   if (strncmp(p, "off", 3) == 0)
> -   stp_online = 0;
> -   else if (strncmp(p, "on", 2) == 0)
> -   stp_online = 1;
> -   return 0;
> +   return kstrtobool(p, 0, _online);
>  }
>  early_param("stp", early_parse_stp);
>
> diff --git a/arch/s390/kernel/topology.c b/arch/s390/kernel/topology.c
> index 40b8102fdadb..5d8a80651f61 100644
> --- a/arch/s390/kernel/topology.c
> +++ b/arch/s390/kernel/topology.c
> @@ -37,7 +37,7 @@ static void set_topology_timer(void);
>  static void topology_work_fn(struct work_struct *work);
>  static struct sysinfo_15_1_x *tl_info;
>
> -static int topology_enabled = 1;
> +static bool topology_enabled = true;
>  static DECLARE_WORK(topology_work, topology_work_fn);
>
>  /*
> @@ -444,10 +444,7 @@ static const struct cpumask *cpu_book_mask(int cpu)
>
>  static int __init early_parse_topology(char *p)
>  {
> -   if (strncmp(p, "off", 3))
> -   return 0;
> -   topology_enabled = 0;
> -   return 0;
> +   return kstrtobool(p, 0, _enabled);
>  }
>  early_param("topology", early_parse_topology);
>
> diff --git a/arch/x86/kernel/aperture_64.c b/arch/x86/kernel/aperture_64.c
> index 6e85f713641d..6b423754083a 100644
> --- a/arch/x86/kernel/aperture_64.c
> +++ b/arch/x86/kernel/aperture_64.c
> @@ -227,19 +227,11 @@ static u32 __init search_agp_bridge(u32 *order, int 
> *valid_agp)

Re: [PATCH 1/2] ethtool: add dynamic flag to ETHTOOL_{GS}RXFH commands

2016-02-04 Thread Keller, Jacob E

On Thu, 2016-02-04 at 17:53 -0500, David Miller wrote:
> From: Jacob Keller 
> Date: Tue,  2 Feb 2016 15:22:06 -0800
> 
> > Ethtool supports a few operations for modifying and controlling
> > a device's RSS table. Sometimes, changes in other features of the
> > device
> > may require (or desire) changes to the RSS table. Currently there
> > is no
> > method to indicate to the driver whether the current RSS table
> > settings
> > should be maintained or overridden.
> 
> Yes, there certainly is a way to indicate this.
> 
> If the user asks for the change in the number of queues, and you
> cannot retain the user's requested RSS settings, then you must fail
> the queue setting change.
> 
> And vice versa.
> 
> You can't say to the user "I can adhere to your requested
> configuration
> change, but I might undo it for some unspecified reason"
> 

The trouble here is the case where the indirection table configurations
are valid now, but a change in the number of queues happening at a
later time currently causes these settings to be lost.

> That's unacceptable behavior, and that's exactly what this dynamic
> flag means.
> 
> If you cannot give the user what he asks for, precisely and reliably,
> you fail the operation with an error.
> 
> There is no way I am adding code which allows these "maybe" kind of
> configuration operations.  Either you can or you can't, and you tell
> the user when you can't by erroring out on the operation that
> invalidates the requirements.
> 
> 

So you're suggesting instead, to error when the second operation
(change number of queues) would fail the current settings?

Current driver behaviors for all the drivers I checked work in one of
two ways.

1) changing queues will destroy the RSS table as it will be
reinitialized regardless of current settings

2) changing queues will maintain the RSS table if possible, unless the
previous RSS table can't function.

No driver currently fails this operation if the RSS table settings
can't be preserved. In addition it results in weird behavior when a
driver sets the RSS table at load, then increases the number of queues
via an ethtool op, the result is that RSS does not use the new queues
added by the ethtool operation.

I can instead drop the ethtool changes and just have my driver record
when the user has changed the tables, and attempt to error on queue
setting operation, which may work.

Essentially the idea was to have a flag indicating "use the driver
defaults" which the driver can change as necessary when the number of
queues changes, or other factors that may require RSS table changes.

I can do this all hidden in the driver but then there is nothing
exposing how the driver will behave under this circumstance.

I'm all for a better suggestion, because I think what we're doing now
is wrong, and the proposed solutions so far don't seem right either.

If we preserve the RSS table when queues increase, then the user may be
confused because RSS settings won't spread to the new queues. If we
destroy the RSS settings the user will be possibly confused because
their selected RSS settings do not work. If we fail the setting of
queues when RSS table is not the default value, then a user might be
confused as to why they can't change the number of queues. Also, I am
unsure whether or not we can tell from the ethtool op function that we
actually are being "reset" to the default, vs just being set. This is
because the netlink message of "0 length" which indicates default
simply has the ethtool core fill in a standard equal weight default,
which I am not sure our driver can tell that it should now be ok to
enable queue changes.

So, something is missing in the current flow to allow this. I think the
best solution is simply prevent changing number of queues while we have
a non-default RSS setting, and require RSS to be reset before queues
can be changed.

Thoughts?

Regards,
Jake

Re: [PATCH v2 1/4] lib: move strtobool to kstrtobool

2016-02-04 Thread Andy Shevchenko

On Thu, Feb 4, 2016 at 11:00 PM, Kees Cook  wrote:
> Create the kstrtobool_from_user helper and moves strtobool logic into
> the new kstrtobool (matching all the other kstrto* functions). Provides
> an inline wrapper for existing strtobool callers.
>
> Signed-off-by: Kees Cook 

Reviewed-by: Andy Shevchenko 

One minor below.

> ---
>  include/linux/kernel.h |  3 +++
>  include/linux/string.h |  6 +-
>  lib/kstrtox.c  | 35 +++
>  lib/string.c   | 29 -
>  4 files changed, 43 insertions(+), 30 deletions(-)
>
> diff --git a/include/linux/kernel.h b/include/linux/kernel.h
> index f31638c6e873..cdc25f47a23f 100644
> --- a/include/linux/kernel.h
> +++ b/include/linux/kernel.h
> @@ -357,6 +357,7 @@ int __must_check kstrtou16(const char *s, unsigned int 
> base, u16 *res);
>  int __must_check kstrtos16(const char *s, unsigned int base, s16 *res);
>  int __must_check kstrtou8(const char *s, unsigned int base, u8 *res);
>  int __must_check kstrtos8(const char *s, unsigned int base, s8 *res);
> +int __must_check kstrtobool(const char *s, unsigned int base, bool *res);
>
>  int __must_check kstrtoull_from_user(const char __user *s, size_t count, 
> unsigned int base, unsigned long long *res);
>  int __must_check kstrtoll_from_user(const char __user *s, size_t count, 
> unsigned int base, long long *res);
> @@ -368,6 +369,8 @@ int __must_check kstrtou16_from_user(const char __user 
> *s, size_t count, unsigne
>  int __must_check kstrtos16_from_user(const char __user *s, size_t count, 
> unsigned int base, s16 *res);
>  int __must_check kstrtou8_from_user(const char __user *s, size_t count, 
> unsigned int base, u8 *res);
>  int __must_check kstrtos8_from_user(const char __user *s, size_t count, 
> unsigned int base, s8 *res);

> +int __must_check kstrtobool_from_user(const char __user *s, size_t count,
> + unsigned int base, bool *res);

We already are using long lines here, perhaps do the same?

>
>  static inline int __must_check kstrtou64_from_user(const char __user *s, 
> size_t count, unsigned int base, u64 *res)
>  {
> diff --git a/include/linux/string.h b/include/linux/string.h
> index 9eebc66d957a..d2fb21b1081d 100644
> --- a/include/linux/string.h
> +++ b/include/linux/string.h
> @@ -128,7 +128,11 @@ extern char **argv_split(gfp_t gfp, const char *str, int 
> *argcp);
>  extern void argv_free(char **argv);
>
>  extern bool sysfs_streq(const char *s1, const char *s2);
> -extern int strtobool(const char *s, bool *res);
> +extern int kstrtobool(const char *s, unsigned int base, bool *res);
> +static inline int strtobool(const char *s, bool *res)
> +{
> +   return kstrtobool(s, 0, res);
> +}
>
>  #ifdef CONFIG_BINARY_PRINTF
>  int vbin_printf(u32 *bin_buf, size_t size, const char *fmt, va_list args);
> diff --git a/lib/kstrtox.c b/lib/kstrtox.c
> index 94be244e8441..e18f088704d7 100644
> --- a/lib/kstrtox.c
> +++ b/lib/kstrtox.c
> @@ -321,6 +321,40 @@ int kstrtos8(const char *s, unsigned int base, s8 *res)
>  }
>  EXPORT_SYMBOL(kstrtos8);
>
> +/**
> + * kstrtobool - convert common user inputs into boolean values
> + * @s: input string
> + * @base: ignored
> + * @res: result
> + *
> + * This routine returns 0 iff the first character is one of 'Yy1Nn0'.
> + * Otherwise it will return -EINVAL.  Value pointed to by res is
> + * updated upon finding a match.
> + */
> +int kstrtobool(const char *s, unsigned int base, bool *res)
> +{
> +   if (!s)
> +   return -EINVAL;
> +
> +   switch (s[0]) {
> +   case 'y':
> +   case 'Y':
> +   case '1':
> +   *res = true;
> +   return 0;
> +   case 'n':
> +   case 'N':
> +   case '0':
> +   *res = false;
> +   return 0;
> +   default:
> +   break;
> +   }
> +
> +   return -EINVAL;
> +}
> +EXPORT_SYMBOL(kstrtobool);
> +
>  #define kstrto_from_user(f, g, type)   \
>  int f(const char __user *s, size_t count, unsigned int base, type *res)  
>   \
>  {  \
> @@ -345,3 +379,4 @@ kstrto_from_user(kstrtou16_from_user,   kstrtou16,
>   u16);
>  kstrto_from_user(kstrtos16_from_user,  kstrtos16,  s16);
>  kstrto_from_user(kstrtou8_from_user,   kstrtou8,   u8);
>  kstrto_from_user(kstrtos8_from_user,   kstrtos8,   s8);
> +kstrto_from_user(kstrtobool_from_user, kstrtobool, bool);
> diff --git a/lib/string.c b/lib/string.c
> index 0323c0d5629a..1a90db9bc6e1 100644
> --- a/lib/string.c
> +++ b/lib/string.c
> @@ -630,35 +630,6 @@ bool sysfs_streq(const char *s1, const char *s2)
>  }
>  EXPORT_SYMBOL(sysfs_streq);
>
> -/**
> - * strtobool - convert common user inputs into boolean values
> - * @s: input string
> - * @res: result
> - *
> - * This routine returns 0 iff the

[PATCH v2 0/4] lib: add "on" and "off" to strtobool

2016-02-04 Thread Kees Cook

This consolidates logic for handling "on"/"off" parsing for bools into
the strtobool function, by way of moving it into kstrtobool (with helpers),
and updating various callers.

 arch/powerpc/kernel/rtasd.c|9 ---
 arch/powerpc/platforms/pseries/hotplug-cpu.c   |   10 
 arch/s390/kernel/time.c|8 ---
 arch/s390/kernel/topology.c|7 ---
 arch/x86/kernel/aperture_64.c  |   12 -
 drivers/net/wireless/marvell/mwifiex/debugfs.c |   10 +---
 fs/cifs/cifs_debug.c   |   58 ++---
 fs/cifs/cifs_debug.h   |2 
 fs/cifs/cifsfs.c   |6 +-
 fs/cifs/cifsglob.h |4 -
 include/linux/kernel.h |3 +
 include/linux/string.h |6 ++
 include/linux/tick.h   |2 
 kernel/time/hrtimer.c  |   10 
 kernel/time/tick-sched.c   |   10 
 lib/kstrtox.c  |   49 +
 lib/string.c   |   29 
 17 files changed, 98 insertions(+), 137 deletions(-)

-Kees

[PATCH v3 net 2/7] net: mvneta: Fix the CPU choice in mvneta_percpu_elect

2016-02-04 Thread Gregory CLEMENT

When passing to the management of multiple RX queue, the
mvneta_percpu_elect function was broken. The use of the modulo can lead
to elect the wrong cpu. For example with rxq_def=2, if the CPU 2 goes
offline and then online, we ended with the third RX queue activated in
the same time on CPU 0 and CPU2, which lead to a kernel crash.

With this fix, we don't try to get "the closer" CPU if the default CPU is
gone, now we just use CPU 0 which always be there. Thanks to this, the
code becomes more readable, easier to maintain and more predicable.

Cc: sta...@vger.kernel.org
Fixes: 2dcf75e2793c ("net: mvneta: Associate RX queues with each CPU")
Signed-off-by: Gregory CLEMENT 
---
 drivers/net/ethernet/marvell/mvneta.c | 13 +
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/marvell/mvneta.c 
b/drivers/net/ethernet/marvell/mvneta.c
index 90ff5c7e19ea..4c2d12423750 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -2849,9 +2849,14 @@ static void mvneta_percpu_disable(void *arg)
 
 static void mvneta_percpu_elect(struct mvneta_port *pp)
 {
-   int online_cpu_idx, max_cpu, cpu, i = 0;
+   int elected_cpu = 0, max_cpu, cpu, i = 0;
+
+   /* Use the cpu associated to the rxq when it is online, in all
+* the other cases, use the cpu 0 which can't be offline.
+*/
+   if (cpu_online(pp->rxq_def))
+   elected_cpu = pp->rxq_def;
 
-   online_cpu_idx = pp->rxq_def % num_online_cpus();
max_cpu = num_present_cpus();
 
for_each_online_cpu(cpu) {
@@ -2862,7 +2867,7 @@ static void mvneta_percpu_elect(struct mvneta_port *pp)
if ((rxq % max_cpu) == cpu)
rxq_map |= MVNETA_CPU_RXQ_ACCESS(rxq);
 
-   if (i == online_cpu_idx)
+   if (cpu == elected_cpu)
/* Map the default receive queue queue to the
 * elected CPU
 */
@@ -2873,7 +2878,7 @@ static void mvneta_percpu_elect(struct mvneta_port *pp)
 * the CPU bound to the default RX queue
 */
if (txq_number == 1)
-   txq_map = (i == online_cpu_idx) ?
+   txq_map = (cpu == elected_cpu) ?
MVNETA_CPU_TXQ_ACCESS(1) : 0;
else
txq_map = mvreg_read(pp, MVNETA_CPU_MAP(cpu)) &
-- 
2.5.0

[PATCH v3 net 4/7] net: mvneta: Remove unused code

2016-02-04 Thread Gregory CLEMENT

Since the commit 2dcf75e2793c ("net: mvneta: Associate RX queues with
each CPU") all the percpu irq are used and disabled at initialization, so
there is no point to disable them first.

Signed-off-by: Gregory CLEMENT 
---
 drivers/net/ethernet/marvell/mvneta.c | 8 
 1 file changed, 8 deletions(-)

diff --git a/drivers/net/ethernet/marvell/mvneta.c 
b/drivers/net/ethernet/marvell/mvneta.c
index f496f9716569..74f8158df2b0 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -3014,14 +3014,6 @@ static int mvneta_open(struct net_device *dev)
goto err_cleanup_txqs;
}
 
-   /* Even though the documentation says that request_percpu_irq
-* doesn't enable the interrupts automatically, it actually
-* does so on the local CPU.
-*
-* Make sure it's disabled.
-*/
-   mvneta_percpu_disable(pp);
-
/* Enable per-CPU interrupt on all the CPU to handle our RX
 * queue interrupts
 */
-- 
2.5.0

Re: [PATCH v3 net-next] net: Implement fast csum_partial for x86_64

2016-02-04 Thread Linus Torvalds

On Thu, Feb 4, 2016 at 1:46 PM, Linus Torvalds
 wrote:
>
> static const unsigned long mask[9] = {
> 0x,
> 0xff00,
> 0x,
> 0xff00,
> 0x,
> 0xff00,
> 0x,
> 0xff00,
> 0x };
> unsigned long val = load_unaligned_zeropad(buf + (len & 1));
> val &= mask[len];

Yeah, that was buggy. I knew it was likely buggy,  but that "buf +
(len & 1)" was just stupid.

The "+" should be "-", of course - the point is to shift up the value
by 8 bits for odd cases, and we need to load starting one byte early
for that. The idea is that we use the byte shifter in the load unit to
do some work for us.

And the bitmasks are the wrong way around for the odd cases - it's the
low byte that ends up being bogus for those cases.

So it should probably look something like

static const unsigned long mask[9] = {
0x,
0xff00,
0x,
0xff00,
0x,
0xff00,
0x,
0xff00,
0x };
unsigned long val = load_unaligned_zeropad(buf - (len & 1));
val &= mask[len];

and then it *might* work.

Still entirely and utterly untested, I just decided to look at it a
bit more and noticed my thinko.

Linus

[PATCH v3 net 1/7] net: mvneta: Fix for_each_present_cpu usage

2016-02-04 Thread Gregory CLEMENT

This patch convert the for_each_present in on_each_cpu, instead of
applying on the present cpus it will be applied only on the online cpus.
This fix a bug reported on
http://thread.gmane.org/gmane.linux.ports.arm.kernel/468173.

Using the macro on_each_cpu (instead of a for_each_* loop) also ensures
that all the calls will be done all at once.

Fixes: f86428854480 ("net: mvneta: Statically assign queues to CPUs")
Reported-by: Stefan Roese 
Suggested-by: Jisheng Zhang 
Suggested-by: Russell King 
Signed-off-by: Gregory CLEMENT 
---
 drivers/net/ethernet/marvell/mvneta.c | 8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/marvell/mvneta.c 
b/drivers/net/ethernet/marvell/mvneta.c
index 662c2ee268c7..90ff5c7e19ea 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -2564,7 +2564,7 @@ static void mvneta_start_dev(struct mvneta_port *pp)
mvneta_port_enable(pp);
 
/* Enable polling on the port */
-   for_each_present_cpu(cpu) {
+   for_each_online_cpu(cpu) {
struct mvneta_pcpu_port *port = per_cpu_ptr(pp->ports, cpu);
 
napi_enable(>napi);
@@ -2589,7 +2589,7 @@ static void mvneta_stop_dev(struct mvneta_port *pp)
 
phy_stop(pp->phy_dev);
 
-   for_each_present_cpu(cpu) {
+   for_each_online_cpu(cpu) {
struct mvneta_pcpu_port *port = per_cpu_ptr(pp->ports, cpu);
 
napi_disable(>napi);
@@ -3057,13 +3057,11 @@ err_cleanup_rxqs:
 static int mvneta_stop(struct net_device *dev)
 {
struct mvneta_port *pp = netdev_priv(dev);
-   int cpu;
 
mvneta_stop_dev(pp);
mvneta_mdio_remove(pp);
unregister_cpu_notifier(>cpu_notifier);
-   for_each_present_cpu(cpu)
-   smp_call_function_single(cpu, mvneta_percpu_disable, pp, true);
+   on_each_cpu(mvneta_percpu_disable, pp, true);
free_percpu_irq(dev->irq, pp->ports);
mvneta_cleanup_rxqs(pp);
mvneta_cleanup_txqs(pp);
-- 
2.5.0

[PATCH v3 net 0/7] mvneta fixes for SMP

2016-02-04 Thread Gregory CLEMENT

Hi,

Following this bug report:
http://thread.gmane.org/gmane.linux.ports.arm.kernel/468173 and the
suggestions from Russell King, I reviewed all the code involving
multi-CPU. It ended with this series of patches which should improve
the stability of the driver.

During my test I found another bug which is fixed by new patch (the
second one of this new version of the series)

The two first patches fix real bugs, the others fix potential issues
in the driver.

Thanks,

Gregory

Changelog:

v1 -> v2
Fix spinlock comment. Pointed by David Miller

v2 -> v3
 - Fix typos and mistake in the comments. Pointed by Sergei Shtylyov
 - Add a new patch fixing the CPU choice in mvneta_percpu_elect
 - Use lock in last patch to prevent remaining race condition. Pointed
   by Jisheng

Gregory CLEMENT (7):
  net: mvneta: Fix for_each_present_cpu usage
  net: mvneta: Fix the CPU choice in mvneta_percpu_elect
  net: mvneta: Use on_each_cpu when possible
  net: mvneta: Remove unused code
  net: mvneta: Modify the queue related fields from each cpu
  net: mvneta: The mvneta_percpu_elect function should be atomic
  net: mvneta: Fix race condition during stopping

 drivers/net/ethernet/marvell/mvneta.c | 184 +++---
 1 file changed, 101 insertions(+), 83 deletions(-)

-- 
2.5.0

[PATCH v3 net 3/7] net: mvneta: Use on_each_cpu when possible

2016-02-04 Thread Gregory CLEMENT

Instead of using a for_each_* loop in which we just call the
smp_call_function_single macro, it is more simple to directly use the
on_each_cpu macro. Moreover, this macro ensures that the calls will be
done all at once.

Suggested-by: Russell King 
Signed-off-by: Gregory CLEMENT 
---
 drivers/net/ethernet/marvell/mvneta.c | 17 ++---
 1 file changed, 6 insertions(+), 11 deletions(-)

diff --git a/drivers/net/ethernet/marvell/mvneta.c 
b/drivers/net/ethernet/marvell/mvneta.c
index 4c2d12423750..f496f9716569 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -2555,7 +2555,7 @@ static void mvneta_percpu_mask_interrupt(void *arg)
 
 static void mvneta_start_dev(struct mvneta_port *pp)
 {
-   unsigned int cpu;
+   int cpu;
 
mvneta_max_rx_size_set(pp, pp->pkt_size);
mvneta_txq_max_tx_size_set(pp, pp->pkt_size);
@@ -2571,9 +2571,8 @@ static void mvneta_start_dev(struct mvneta_port *pp)
}
 
/* Unmask interrupts. It has to be done from each CPU */
-   for_each_online_cpu(cpu)
-   smp_call_function_single(cpu, mvneta_percpu_unmask_interrupt,
-pp, true);
+   on_each_cpu(mvneta_percpu_unmask_interrupt, pp, true);
+
mvreg_write(pp, MVNETA_INTR_MISC_MASK,
MVNETA_CAUSE_PHY_STATUS_CHANGE |
MVNETA_CAUSE_LINK_CHANGE |
@@ -2993,7 +2992,7 @@ static int mvneta_percpu_notifier(struct notifier_block 
*nfb,
 static int mvneta_open(struct net_device *dev)
 {
struct mvneta_port *pp = netdev_priv(dev);
-   int ret, cpu;
+   int ret;
 
pp->pkt_size = MVNETA_RX_PKT_SIZE(pp->dev->mtu);
pp->frag_size = SKB_DATA_ALIGN(MVNETA_RX_BUF_SIZE(pp->pkt_size)) +
@@ -3026,9 +3025,7 @@ static int mvneta_open(struct net_device *dev)
/* Enable per-CPU interrupt on all the CPU to handle our RX
 * queue interrupts
 */
-   for_each_online_cpu(cpu)
-   smp_call_function_single(cpu, mvneta_percpu_enable,
-pp, true);
+   on_each_cpu(mvneta_percpu_enable, pp, true);
 
 
/* Register a CPU notifier to handle the case where our CPU
@@ -3315,9 +3312,7 @@ static int  mvneta_config_rss(struct mvneta_port *pp)
 
netif_tx_stop_all_queues(pp->dev);
 
-   for_each_online_cpu(cpu)
-   smp_call_function_single(cpu, mvneta_percpu_mask_interrupt,
-pp, true);
+   on_each_cpu(mvneta_percpu_mask_interrupt, pp, true);
 
/* We have to synchronise on the napi of each CPU */
for_each_online_cpu(cpu) {
-- 
2.5.0

Re: [RFC RESEND] iwlwifi: pcie: transmit queue auto-sizing

2016-02-04 Thread Dave Taht

I am not on linux-wireless nor netdev presently, but...

On Thu, Feb 4, 2016 at 12:16 PM, Emmanuel Grumbach
 wrote:
> As many (all?) WiFi devices, Intel WiFi devices have
> transmit queues which have 256 transmit descriptors
> each and each descriptor corresponds to an MPDU.
> This means that when it is full, the queue contains
> 256 * ~1500 bytes to be transmitted (if we don't have
> A-MSDUs). The purpose of those queues is to have enough
> packets to be ready for transmission so that when the device
> gets an opportunity to transmit (TxOP), it can take as many
> packets as the spec allows and aggregate them into one
> A-MPDU or even several A-MPDUs if we are using bursts.
>
> The problem is that the packets that are in these queues are
> already out of control of the Qdisc and can stay in those
> queues for a fairly long time when the link condition is
> not good. This leads to the well known bufferbloat problem.
>
> This patch adds a way to tune the size of the transmit queue
> so that it won't cause excessive latency. When the link
> condition is good, the packets will flow smoothly and the
> transmit queue will grow quickly allowing A-MPDUs and
> maximal throughput. When the link is not optimal, we will
> have retransmissions, lower transmit rates or signal
> detection (CCA) which will cause a delay in the packet
> transmission. The driver will sense this higher latency
> and will reduce the size of the transmit queue.
> This means that the packets that continue to arrive
> will pile up in the Qdisc rather than in the device
> queues. The major advantage of this approach is that
> codel can now do its work.
>
> The algorithm is really (too?) simple:
> every 5 seconds, starts from a short queue again.
> If a packet has been in the queue for less than 10ms,
> allow 10 more MPDUs in.
> If a packet has been in the queue for more than 20ms,
> reduce by 10 the size of the transmit queue.
>
> The implementation is really naive and way too simple:
>  * reading jiffies for every Tx / Tx status is not a
>good idead.
>  * jiffies are not fine-grained enough on all platforms
>  * the constants chosen are really arbitrary and can't be
>tuned.
>  * This may be implemented in mac80211 probably and help
>other drivers.
>  * etc...
>
> But already this gives nice results. I ran a very simple
> experiment: I put the device in a controlled environment
> and ran traffic while running default sized ping in the
> background. In this configuration, our device quickly
> raises its transmission rate to the best rate.
> Then, I force the device to use the lowest rate (6Mbps).
> Of course, the throughput collapses, but the ping RTT
> shoots up.
> Using codel helps, but the latency is still high. Codel
> with this patch gives much better results:
>
> pfifo_fast:
> rtt min/avg/max/mdev = 1932.616/2393.284/2833.407/315.941 ms, pipe 3, 
> ipg/ewma 2215.707/2446.884 ms
>
> fq_codel + Tx queue auto-sizing:
> rtt min/avg/max/mdev = 13.541/32.396/54.791/9.610 ms, ipg/ewma 200.685/32.202 
> ms
>
> fq_codel without Tx queue auto-sizing:
> rtt min/avg/max/mdev = 140.821/257.303/331.889/31.074 ms, pipe 2, ipg/ewma 
> 258.147/252.847 ms

This is a dramatic improvement. But I'm not sure what you are
measuring. Is this the 6mbit test? What happens when you send traffic
the other way (more pure acks, rather than big packets?)

I try to encourage folk to use flent whenever possible, for pretty
graphs and long term measurements, so you can simultaneously measure
both throughput and latency.

flent.org's .14 release just shipped.

> Clearly, there is more work to do to be able to merge this,
> but it seems that the wireless problems mentioned in
> https://lwn.net/Articles/616241/ may have a solution.

I gave talks on the problems that wifi had with bufferbloat at the
ieee 802.11 wg meeting a while back, and more recently it was filmed
at battlemesh.

https://www.youtube.com/watch?v=Rb-UnHDw02o

I have spent my time since trying to raise sufficient resources
(testbeds and test tools), orgs, people and money to tackle these
problems at more depth. We made a bit of progress recently which I can
talk about offline...

In that talk I suggested that overall we move towards timestamping
everything, that (at least in the case of the ath9k and mt72) we tie
together aggregation with a byte based estimator similar to how BQL
works, and I hoped that eventually - we'd be able to basically - at
low rates, keep no more than one aggregate in the hardware, one in the
driver queue, and one being assembled. The pending aggregate would be
sent to the hardware on the completion interrupt for the previous
aggregate, which would fire off the size estimator and start
aggrefying the one being assembled.

A hook to do that is in use on the mt72 chipset that felix is working
on... but nowhere else so far as I know (as yet).

the iwl does it's own aggregation (I think(?))... but estimates can
still be made...

There are WAY more

Re: [PATCH v3 net-next] net: Implement fast csum_partial for x86_64

2016-02-04 Thread Tom Herbert

On Thu, Feb 4, 2016 at 9:09 AM, David Laight  wrote:
> From: Tom Herbert
> ...
>> > If nothing else reducing the size of this main loop may be desirable.
>> > I know the newer x86 is supposed to have a loop buffer so that it can
>> > basically loop on already decoded instructions.  Normally it is only
>> > something like 64 or 128 bytes in size though.  You might find that
>> > reducing this loop to that smaller size may improve the performance
>> > for larger payloads.
>>
>> I saw 128 to be better in my testing. For large packets this loop does
>> all the work. I see performance dependent on the amount of loop
>> overhead, i.e. we got it down to two non-adcq instructions but it is
>> still noticeable. Also, this helps a lot on sizes up to 128 bytes
>> since we only need to do single call in the jump table and no trip
>> through the loop.
>
> But one of your 'loop overhead' instructions is 'loop'.
> Look at http://www.agner.org/optimize/instruction_tables.pdf
> you don't want to be using 'loop' on intel cpus.
>
I'm not following. We can replace loop with decl %ecx and jg, but why
is that better?

Tom

> You might get some benefit from pipelining the loop (so you do
> a read to register in one iteration and a register-register adc
> the next).
>
> David
>

Re: [RFC v2] iwlwifi: pcie: transmit queue auto-sizing

2016-02-04 Thread Ben Greear


On 02/04/2016 12:56 PM, Grumbach, Emmanuel wrote:



On 02/04/2016 10:46 PM, Ben Greear wrote:

On 02/04/2016 12:16 PM, Emmanuel Grumbach wrote:

As many (all?) WiFi devices, Intel WiFi devices have
transmit queues which have 256 transmit descriptors
each and each descriptor corresponds to an MPDU.
This means that when it is full, the queue contains
256 * ~1500 bytes to be transmitted (if we don't have
A-MSDUs). The purpose of those queues is to have enough
packets to be ready for transmission so that when the device
gets an opportunity to transmit (TxOP), it can take as many
packets as the spec allows and aggregate them into one
A-MPDU or even several A-MPDUs if we are using bursts.

I guess this is only really usable if you have exactly one
peer connected (ie, in station mode)?

Otherwise, you could have one slow peer and one fast one,
and then I suspect this would not work so well?


Yes. I guess this one (big) limitation. I guess that what would happen
in this case is that the the latency would constantly jitter. But I also
noticed that I could reduce the transmit queue to 130 descriptors
(instead of 256) and still reach maximal throughput because we can
refill the queues quickly enough.
In iwlwifi, we have plans to have one queue for each peer.
This is under development. Not sure when it'll be ready. It also requires
firmware change obviously.


Per-peer queues will probably be nice, especially if we can keep the
buffer bloat manageable.


For reference, ath10k has around 1400 tx descriptors, though
in practice not all are usable, and in stock firmware, I'm guessing
the NIC will never be able to actually fill up it's tx descriptors
and stop traffic.  Instead, it just allows the stack to try to
TX, then drops the frame...


1400 descriptors, ok... but they are not organised in queues?
(forgive my ignorance of athX drivers)


I think all the details are in the firmware, at least for now.

The firmware details are probably not something I should go into, but suffice 
it to say
its complex and varies between firmware versions in non-trivial ways.

Thanks,
Ben

--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com

Re: [PATCH v2 2/4] lib: update single-char callers of strtobool

2016-02-04 Thread Andy Shevchenko

On Thu, Feb 4, 2016 at 11:00 PM, Kees Cook  wrote:
> Some callers of strtobool were passing a pointer to unterminated strings.
> In preparation of adding multi-character processing to kstrtobool, update
> the callers to not pass single-character pointers, and switch to using the
> new kstrtobool_from_user helper where possible.

Looks much better now!
My comment below.

>
> Signed-off-by: Kees Cook 
> Cc: Amitkumar Karwar 
> Cc: Nishant Sarmukadam 
> Cc: Kalle Valo 
> Cc: Steve French 
> Cc: linux-c...@vger.kernel.org
> ---
>  drivers/net/wireless/marvell/mwifiex/debugfs.c | 10 ++---
>  fs/cifs/cifs_debug.c   | 58 
> +++---
>  fs/cifs/cifs_debug.h   |  2 +-
>  fs/cifs/cifsfs.c   |  6 +--
>  fs/cifs/cifsglob.h |  4 +-
>  5 files changed, 26 insertions(+), 54 deletions(-)
>
> diff --git a/drivers/net/wireless/marvell/mwifiex/debugfs.c 
> b/drivers/net/wireless/marvell/mwifiex/debugfs.c
> index 0b9c580af988..bd061b02bc04 100644
> --- a/drivers/net/wireless/marvell/mwifiex/debugfs.c
> +++ b/drivers/net/wireless/marvell/mwifiex/debugfs.c
> @@ -880,14 +880,12 @@ mwifiex_reset_write(struct file *file,
>  {
> struct mwifiex_private *priv = file->private_data;
> struct mwifiex_adapter *adapter = priv->adapter;
> -   char cmd;
> bool result;
> +   int rc;
>
> -   if (copy_from_user(, ubuf, sizeof(cmd)))
> -   return -EFAULT;
> -
> -   if (strtobool(, ))
> -   return -EINVAL;
> +   rc = kstrtobool_from_user(ubuf, count, 0, );
> +   if (rc)
> +   return rc;
>
> if (!result)
> return -EINVAL;
> diff --git a/fs/cifs/cifs_debug.c b/fs/cifs/cifs_debug.c
> index 50b268483302..6ee59abcb69b 100644
> --- a/fs/cifs/cifs_debug.c
> +++ b/fs/cifs/cifs_debug.c
> @@ -255,7 +255,6 @@ static const struct file_operations 
> cifs_debug_data_proc_fops = {
>  static ssize_t cifs_stats_proc_write(struct file *file,
> const char __user *buffer, size_t count, loff_t *ppos)
>  {
> -   char c;
> bool bv;
> int rc;
> struct list_head *tmp1, *tmp2, *tmp3;
> @@ -263,11 +262,8 @@ static ssize_t cifs_stats_proc_write(struct file *file,
> struct cifs_ses *ses;
> struct cifs_tcon *tcon;
>
> -   rc = get_user(c, buffer);
> -   if (rc)
> -   return rc;
> -
> -   if (strtobool(, ) == 0) {
> +   rc = kstrtobool_from_user(buffer, count, 0, );
> +   if (rc == 0) {
>  #ifdef CONFIG_CIFS_STATS2
> atomic_set(, 0);
> atomic_set(, 0);
> @@ -290,6 +286,8 @@ static ssize_t cifs_stats_proc_write(struct file *file,
> }
> }
> spin_unlock(_tcp_ses_lock);
> +   } else {
> +   return rc;
> }
>
> return count;
> @@ -433,17 +431,17 @@ static int cifsFYI_proc_open(struct inode *inode, 
> struct file *file)
>  static ssize_t cifsFYI_proc_write(struct file *file, const char __user 
> *buffer,
> size_t count, loff_t *ppos)
>  {
> -   char c;
> +   char c[2] = { '\0' };
> bool bv;
> int rc;
>
> -   rc = get_user(c, buffer);
> +   rc = get_user(c[0], buffer);

> if (rc)
> return rc;
> -   if (strtobool(, ) == 0)
> +   if (strtobool(c, ) == 0)
> cifsFYI = bv;
> -   else if ((c > '1') && (c <= '9'))
> -   cifsFYI = (int) (c - '0'); /* see cifs_debug.h for meanings */
> +   else if ((c[0] > '1') && (c[0] <= '9'))
> +   cifsFYI = (int) (c[0] - '0'); /* see cifs_debug.h for 
> meanings */
>
> return count;
>  }
> @@ -471,20 +469,12 @@ static int cifs_linux_ext_proc_open(struct inode 
> *inode, struct file *file)
>  static ssize_t cifs_linux_ext_proc_write(struct file *file,
> const char __user *buffer, size_t count, loff_t *ppos)
>  {
> -   char c;
> -   bool bv;
> int rc;
>
> -   rc = get_user(c, buffer);
> +   rc = kstrtobool_from_user(buffer, count, 0, );
> if (rc)
> return rc;
>
> -   rc = strtobool(, );
> -   if (rc)
> -   return rc;
> -
> -   linuxExtEnabled = bv;
> -
> return count;
>  }
>
> @@ -511,20 +501,12 @@ static int cifs_lookup_cache_proc_open(struct inode 
> *inode, struct file *file)
>  static ssize_t cifs_lookup_cache_proc_write(struct file *file,
> const char __user *buffer, size_t count, loff_t *ppos)
>  {
> -   char c;
> -   bool bv;
> int rc;
>
> -   rc = get_user(c, buffer);
> +   rc = kstrtobool_from_user(buffer, count, 0, );
> if (rc)
> return rc;
>
> -   rc = strtobool(, );
> -   if (rc)
> -

[PATCH v3 net 7/7] net: mvneta: Fix race condition during stopping

2016-02-04 Thread Gregory CLEMENT

When stopping the port, the CPU notifier are still there whereas the
mvneta_stop_dev function calls mvneta_percpu_disable() on each CPUs.
It was possible to have a new CPU coming at this point which could be
racy.

This patch adds a flag preventing executing the code notifier for a new
CPU when the port is stopping. It also uses the spinlock introduces
previously. To avoid the deadlock, the lock has been moved outside the
mvneta_percpu_elect function.

Signed-off-by: Gregory CLEMENT 
---
 drivers/net/ethernet/marvell/mvneta.c | 36 +++
 1 file changed, 28 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/marvell/mvneta.c 
b/drivers/net/ethernet/marvell/mvneta.c
index b12a745a0e4c..b0ae69f84493 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -374,6 +374,7 @@ struct mvneta_port {
 * ensuring that the configuration remains coherent.
 */
spinlock_t lock;
+   bool is_stopped;
 
/* Core clock */
struct clk *clk;
@@ -2855,16 +2856,14 @@ static void mvneta_percpu_disable(void *arg)
disable_percpu_irq(pp->dev->irq);
 }
 
+/* Electing a CPU must be done in an atomic way: it should be done
+ * after or before the removal/insertion of a CPU and this function is
+ * not reentrant.
+ */
 static void mvneta_percpu_elect(struct mvneta_port *pp)
 {
int elected_cpu = 0, max_cpu, cpu, i = 0;
 
-   /* Electing a CPU must be done in an atomic way: it should be
-* done after or before the removal/insertion of a CPU and
-* this function is not reentrant.
-*/
-   spin_lock(>lock);
-
/* Use the cpu associated to the rxq when it is online, in all
 * the other cases, use the cpu 0 which can't be offline.
 */
@@ -2908,7 +2907,6 @@ static void mvneta_percpu_elect(struct mvneta_port *pp)
i++;
 
}
-   spin_unlock(>lock);
 };
 
 static int mvneta_percpu_notifier(struct notifier_block *nfb,
@@ -2922,6 +2920,14 @@ static int mvneta_percpu_notifier(struct notifier_block 
*nfb,
switch (action) {
case CPU_ONLINE:
case CPU_ONLINE_FROZEN:
+   spin_lock(>lock);
+   /* Configuring the driver for a new CPU while the
+* driver is stopping is racy, so just avoid it.
+*/
+   if (pp->is_stopped) {
+   spin_unlock(>lock);
+   break;
+   }
netif_tx_stop_all_queues(pp->dev);
 
/* We have to synchronise on tha napi of each CPU
@@ -2959,6 +2965,7 @@ static int mvneta_percpu_notifier(struct notifier_block 
*nfb,
MVNETA_CAUSE_LINK_CHANGE |
MVNETA_CAUSE_PSC_SYNC_CHANGE);
netif_tx_start_all_queues(pp->dev);
+   spin_unlock(>lock);
break;
case CPU_DOWN_PREPARE:
case CPU_DOWN_PREPARE_FROZEN:
@@ -2983,7 +2990,9 @@ static int mvneta_percpu_notifier(struct notifier_block 
*nfb,
case CPU_DEAD:
case CPU_DEAD_FROZEN:
/* Check if a new CPU must be elected now this on is down */
+   spin_lock(>lock);
mvneta_percpu_elect(pp);
+   spin_unlock(>lock);
/* Unmask all ethernet port interrupts */
on_each_cpu(mvneta_percpu_unmask_interrupt, pp, true);
mvreg_write(pp, MVNETA_INTR_MISC_MASK,
@@ -3027,7 +3036,7 @@ static int mvneta_open(struct net_device *dev)
 */
on_each_cpu(mvneta_percpu_enable, pp, true);
 
-
+   pp->is_stopped = false;
/* Register a CPU notifier to handle the case where our CPU
 * might be taken offline.
 */
@@ -3060,9 +3069,18 @@ static int mvneta_stop(struct net_device *dev)
 {
struct mvneta_port *pp = netdev_priv(dev);
 
+   /* Inform that we are stopping so we don't want to setup the
+* driver for new CPUs in the notifiers
+*/
+   spin_lock(>lock);
+   pp->is_stopped = true;
mvneta_stop_dev(pp);
mvneta_mdio_remove(pp);
unregister_cpu_notifier(>cpu_notifier);
+   /* Now that the notifier are unregistered, we can release le
+* lock
+*/
+   spin_unlock(>lock);
on_each_cpu(mvneta_percpu_disable, pp, true);
free_percpu_irq(dev->irq, pp->ports);
mvneta_cleanup_rxqs(pp);
@@ -,7 +3351,9 @@ static int  mvneta_config_rss(struct mvneta_port *pp)
mvreg_write(pp, MVNETA_PORT_CONFIG, val);
 
/* Update the elected CPU matching the new rxq_def */
+   spin_lock(>lock);
mvneta_percpu_elect(pp);
+   spin_unlock(>lock);
 
/* We have to synchronise on the napi of each CPU */
for_each_online_cpu(cpu) {
-- 
2.5.0

Re: [PATCH v3 net-next] net: Implement fast csum_partial for x86_64

2016-02-04 Thread Tom Herbert

On Thu, Feb 4, 2016 at 1:46 PM, Linus Torvalds
 wrote:
> I missed the original email (I don't have net-devel in my mailbox),
> but based on Ingo's quoting have a more fundamental question:
>
> Why wasn't that done with C code instead of asm with odd numerical targets?
>
The reason I did this in assembly is precisely about the your point of
having to close the carry chains with adcq $0. I do have a first
implementation in C which using switch() to handle alignment, excess
length less than 8 bytes, and the odd number of quads to sum in the
main loop. gcc turns these switch statements into jump tables (not
function tables which is what Ingo's example code was using). The
problem I hit was that for each case I needed to close the carry chain
in the inline asm so fall through wouldn't have much value and each
case is expanded. The C version using switch gave a nice performance
gain, moving to all assembly was somewhat better.

There is also question of alignment. I f we really don't need to worry
about alignment at all on x86, then we should be able to eliminate the
complexity of dealing with it.

> It seems likely that the real issue is avoiding the short loops (that
> will cause branch prediction problems) and use a lookup table instead.
>
> But we can probably do better than that asm.
>
> For example, for the actual "8 bytes or shorter" case, I think
> something like this might just work fine:
>
>   unsigned long csum_partial_8orless(const unsigned char *buf,
> unsigned long len, unsigned long sum)
>   {
> static const unsigned long mask[9] = {
> 0x,
> 0xff00,
> 0x,
> 0xff00,
> 0x,
> 0xff00,
> 0x,
> 0xff00,
> 0x };
> unsigned long val = load_unaligned_zeropad(buf + (len & 1));
> val &= mask[len];
> asm("addq %1,%0 ; adcq $0,%0":"=r" (sum):"r" (val), "0" (sum));
> return sum;
>   }
>
I will look at doing that.

Thanks,
Tom

> NOTE! The above is 100% untested. But I _think_ it may handle the
> odd-byte-case correctly, and it should result in just one 8-byte load
> (the "load_unaligned_zeropad()" is just in case that ends up
> overflowing and we have page-alloc-debug triggering a page fault at
> the end). All without looping or any conditional branches that might
> mispredict.
>
> My point is that going to assembly results in pretty hard-to-read
> code, but it's also fairly inflexible. If we stay with C, we can much
> more easily play tricks. So for example, make the above an inline
> function, and now you can do things like this:
>
>   static inline unsigned long csum_partial_64bit(void *buf, unsigned
> long len, unsigned long sum)
>   {
> if (len <= 8)
> return csum_partial_8orless(buf, len, sum);
>
> /* We know it's larger than 8 bytes, so handle alignment */
> align = 7 & -(unsigned long)buf;
> sum = csum_partial_8orless(buf, align, sum);
> buf += align;
>
> /* We need to do the big-endian thing */
> sum = rotate_by8_if_odd(sum, align);
>
> /* main loop for big packets */
> .. do the unrolled inline asm thing that we already do ..
>
> sum = rotate_by8_if_odd(sum, align);
>
> /* Handle the last bytes */
> return csum_partial_8orless(buf, len, sum);
>   }
>
>   /* Fold the 64-bit sum we computed down to 32 bits __wsum */
>   __wsum int csum_partial(void *buf, unsigned int len, __wsum partial)
>   {
> unsigned long sum = csum_partial_64bit(ptr, len, partial);
> asm("addl %1,%0 ; adcl $0,%0":"=r" (sum):"r" (sum >> 32), "0" (sum));
> return sum;
>  }
>
> or something like that.
>
> NOTE NOTE NOTE! I did a half-arsed attempt at getting the whole
> "big-endian 16-bit add" thing right by doing the odd byte masking in
> the end cases, and by rotating the sum by 8 bits around the
> 8-byte-unrolled-loop, but I didn't test the above. It's literally
> written in my mail client. So I can almost guarantee that it is buggy,
> but it is meant as an *example* of "why not do it this way" rather
> than production code.
>
> I think that writing it in C and trying to be intelligent about it
> like the above would result in more maintainable code, and it is
> possible that it would even be faster.
>
> Yes, writing it in C *does* likely result in a few more cases of "adcq
> $0" in order to finish up the carry calculations. The *only* advantage
> of inline asm is how it allows you to keep the carry flag around. So
> there is downside to the C model, and it might cause a cycle or two of
> extra work, but the upside of C is that you can try to do clever
> things without turning the code completely unreadable.
>
> For example, doing the exception handling (that

Re: [PATCH v2 3/4] lib: add "on"/"off" support to kstrtobool

2016-02-04 Thread Andy Shevchenko

On Thu, Feb 4, 2016 at 11:00 PM, Kees Cook  wrote:
> Add support for "on" and "off" when converting to boolean.
>
> Signed-off-by: Kees Cook 
> ---
>  lib/kstrtox.c | 14 ++
>  1 file changed, 14 insertions(+)
>
> diff --git a/lib/kstrtox.c b/lib/kstrtox.c
> index e18f088704d7..09e83a19a96d 100644
> --- a/lib/kstrtox.c
> +++ b/lib/kstrtox.c
> @@ -347,6 +347,20 @@ int kstrtobool(const char *s, unsigned int base, bool 
> *res)

Forgot update description?

> case '0':
> *res = false;
> return 0;
> +   case 'o':
> +   case 'O':
> +   switch (s[1]) {
> +   case 'n':
> +   case 'N':
> +   *res = true;
> +   return 0;
> +   case 'f':
> +   case 'F':
> +   *res = false;
> +   return 0;
> +   default:
> +   break;
> +   }
> default:
> break;
> }
> --
> 2.6.3
>



-- 
With Best Regards,
Andy Shevchenko

Re: [net-next PATCH 0/7] tc offload for cls_u32 on ixgbe

2016-02-04 Thread Pablo Neira Ayuso

On Thu, Feb 04, 2016 at 10:16:56AM +0100, Jiri Pirko wrote:
> Wed, Feb 03, 2016 at 10:27:32AM CET, john.fastab...@gmail.com wrote:
> >
> >Also by adding get_parse_graph and set_parse_graph attributes as
> >in my previous flow_api work we can build programmable devices
> >and programmatically learn when rules can or can not be loaded
> >into the hardware. Again future work.
> >
> >Any comments/feedback appreciated.
> 
> I like this being thin and elegant solution. However, ~2 years ago when I
> pushed openvswitch kernel datapath offload patchset, people were yelling
> at me that it is not generic enough solution, that tc has to be able
> to use the api (Jamal :)), nftables as well.

I would be glad to join this debate during NetDev 1.1 too.

I think we should provide a solution that allows people uses both
tc and nftables, this would require a bit of generic infrastructure on
top of it so we don't restrict users to one single solution, in other
words, we allow the user to select its own poison.

> Now this patch is making offload strictly tc-based and nobody seems to
> care :) I do. I think that we might try to find some generic middle layer.

I agree and I'll be happy to help to push this ahead. Let's try to sit
and get together to resolve this.

See you soon.

Re: [PATCH 1/2] ethtool: add dynamic flag to ETHTOOL_{GS}RXFH commands

2016-02-04 Thread David Miller

From: Jacob Keller 
Date: Tue,  2 Feb 2016 15:22:06 -0800

> Ethtool supports a few operations for modifying and controlling
> a device's RSS table. Sometimes, changes in other features of the device
> may require (or desire) changes to the RSS table. Currently there is no
> method to indicate to the driver whether the current RSS table settings
> should be maintained or overridden.

Yes, there certainly is a way to indicate this.

If the user asks for the change in the number of queues, and you
cannot retain the user's requested RSS settings, then you must fail
the queue setting change.

And vice versa.

You can't say to the user "I can adhere to your requested configuration
change, but I might undo it for some unspecified reason"

That's unacceptable behavior, and that's exactly what this dynamic
flag means.

If you cannot give the user what he asks for, precisely and reliably,
you fail the operation with an error.

There is no way I am adding code which allows these "maybe" kind of
configuration operations.  Either you can or you can't, and you tell
the user when you can't by erroring out on the operation that
invalidates the requirements.

Re: [PATCH v2 2/4] lib: update single-char callers of strtobool

2016-02-04 Thread Kees Cook

On Thu, Feb 4, 2016 at 2:59 PM, Andy Shevchenko
 wrote:
> On Thu, Feb 4, 2016 at 11:00 PM, Kees Cook  wrote:
>> Some callers of strtobool were passing a pointer to unterminated strings.
>> In preparation of adding multi-character processing to kstrtobool, update
>> the callers to not pass single-character pointers, and switch to using the
>> new kstrtobool_from_user helper where possible.
>
> Looks much better now!
> My comment below.
>
>>
>> Signed-off-by: Kees Cook 
>> Cc: Amitkumar Karwar 
>> Cc: Nishant Sarmukadam 
>> Cc: Kalle Valo 
>> Cc: Steve French 
>> Cc: linux-c...@vger.kernel.org
>> ---
>>  drivers/net/wireless/marvell/mwifiex/debugfs.c | 10 ++---
>>  fs/cifs/cifs_debug.c   | 58 
>> +++---
>>  fs/cifs/cifs_debug.h   |  2 +-
>>  fs/cifs/cifsfs.c   |  6 +--
>>  fs/cifs/cifsglob.h |  4 +-
>>  5 files changed, 26 insertions(+), 54 deletions(-)
>>
>> diff --git a/drivers/net/wireless/marvell/mwifiex/debugfs.c 
>> b/drivers/net/wireless/marvell/mwifiex/debugfs.c
>> index 0b9c580af988..bd061b02bc04 100644
>> --- a/drivers/net/wireless/marvell/mwifiex/debugfs.c
>> +++ b/drivers/net/wireless/marvell/mwifiex/debugfs.c
>> @@ -880,14 +880,12 @@ mwifiex_reset_write(struct file *file,
>>  {
>> struct mwifiex_private *priv = file->private_data;
>> struct mwifiex_adapter *adapter = priv->adapter;
>> -   char cmd;
>> bool result;
>> +   int rc;
>>
>> -   if (copy_from_user(, ubuf, sizeof(cmd)))
>> -   return -EFAULT;
>> -
>> -   if (strtobool(, ))
>> -   return -EINVAL;
>> +   rc = kstrtobool_from_user(ubuf, count, 0, );
>> +   if (rc)
>> +   return rc;
>>
>> if (!result)
>> return -EINVAL;
>> diff --git a/fs/cifs/cifs_debug.c b/fs/cifs/cifs_debug.c
>> index 50b268483302..6ee59abcb69b 100644
>> --- a/fs/cifs/cifs_debug.c
>> +++ b/fs/cifs/cifs_debug.c
>> @@ -255,7 +255,6 @@ static const struct file_operations 
>> cifs_debug_data_proc_fops = {
>>  static ssize_t cifs_stats_proc_write(struct file *file,
>> const char __user *buffer, size_t count, loff_t *ppos)
>>  {
>> -   char c;
>> bool bv;
>> int rc;
>> struct list_head *tmp1, *tmp2, *tmp3;
>> @@ -263,11 +262,8 @@ static ssize_t cifs_stats_proc_write(struct file *file,
>> struct cifs_ses *ses;
>> struct cifs_tcon *tcon;
>>
>> -   rc = get_user(c, buffer);
>> -   if (rc)
>> -   return rc;
>> -
>> -   if (strtobool(, ) == 0) {
>> +   rc = kstrtobool_from_user(buffer, count, 0, );
>> +   if (rc == 0) {
>>  #ifdef CONFIG_CIFS_STATS2
>> atomic_set(, 0);
>> atomic_set(, 0);
>> @@ -290,6 +286,8 @@ static ssize_t cifs_stats_proc_write(struct file *file,
>> }
>> }
>> spin_unlock(_tcp_ses_lock);
>> +   } else {
>> +   return rc;
>> }
>>
>> return count;
>> @@ -433,17 +431,17 @@ static int cifsFYI_proc_open(struct inode *inode, 
>> struct file *file)
>>  static ssize_t cifsFYI_proc_write(struct file *file, const char __user 
>> *buffer,
>> size_t count, loff_t *ppos)
>>  {
>> -   char c;
>> +   char c[2] = { '\0' };
>> bool bv;
>> int rc;
>>
>> -   rc = get_user(c, buffer);
>> +   rc = get_user(c[0], buffer);
>
>> if (rc)
>> return rc;
>> -   if (strtobool(, ) == 0)
>> +   if (strtobool(c, ) == 0)
>> cifsFYI = bv;
>> -   else if ((c > '1') && (c <= '9'))
>> -   cifsFYI = (int) (c - '0'); /* see cifs_debug.h for meanings 
>> */
>> +   else if ((c[0] > '1') && (c[0] <= '9'))
>> +   cifsFYI = (int) (c[0] - '0'); /* see cifs_debug.h for 
>> meanings */
>>
>> return count;
>>  }
>> @@ -471,20 +469,12 @@ static int cifs_linux_ext_proc_open(struct inode 
>> *inode, struct file *file)
>>  static ssize_t cifs_linux_ext_proc_write(struct file *file,
>> const char __user *buffer, size_t count, loff_t *ppos)
>>  {
>> -   char c;
>> -   bool bv;
>> int rc;
>>
>> -   rc = get_user(c, buffer);
>> +   rc = kstrtobool_from_user(buffer, count, 0, );
>> if (rc)
>> return rc;
>>
>> -   rc = strtobool(, );
>> -   if (rc)
>> -   return rc;
>> -
>> -   linuxExtEnabled = bv;
>> -
>> return count;
>>  }
>>
>> @@ -511,20 +501,12 @@ static int cifs_lookup_cache_proc_open(struct inode 
>> *inode, struct file *file)
>>  static ssize_t cifs_lookup_cache_proc_write(struct file *file,
>> const char __user *buffer, size_t count, loff_t *ppos)
>>  {
>> -   char c;
>>

Re: [PATCH v2 3/4] lib: add "on"/"off" support to kstrtobool

2016-02-04 Thread Kees Cook

On Thu, Feb 4, 2016 at 3:00 PM, Andy Shevchenko
 wrote:
> On Thu, Feb 4, 2016 at 11:00 PM, Kees Cook  wrote:
>> Add support for "on" and "off" when converting to boolean.
>>
>> Signed-off-by: Kees Cook 
>> ---
>>  lib/kstrtox.c | 14 ++
>>  1 file changed, 14 insertions(+)
>>
>> diff --git a/lib/kstrtox.c b/lib/kstrtox.c
>> index e18f088704d7..09e83a19a96d 100644
>> --- a/lib/kstrtox.c
>> +++ b/lib/kstrtox.c
>> @@ -347,6 +347,20 @@ int kstrtobool(const char *s, unsigned int base, bool 
>> *res)
>
> Forgot update description?

Argh, thank you. Good eye. Sent another update.

-Kees

>
>> case '0':
>> *res = false;
>> return 0;
>> +   case 'o':
>> +   case 'O':
>> +   switch (s[1]) {
>> +   case 'n':
>> +   case 'N':
>> +   *res = true;
>> +   return 0;
>> +   case 'f':
>> +   case 'F':
>> +   *res = false;
>> +   return 0;
>> +   default:
>> +   break;
>> +   }
>> default:
>> break;
>> }
>> --
>> 2.6.3
>>
>
>
>
> --
> With Best Regards,
> Andy Shevchenko



-- 
Kees Cook
Chrome OS & Brillo Security

Re: [PATCH v3 net-next] net: Implement fast csum_partial for x86_64

2016-02-04 Thread Alexander Duyck

On Thu, Feb 4, 2016 at 12:59 PM, Tom Herbert  wrote:
> On Thu, Feb 4, 2016 at 9:09 AM, David Laight  wrote:
>> From: Tom Herbert
>> ...
>>> > If nothing else reducing the size of this main loop may be desirable.
>>> > I know the newer x86 is supposed to have a loop buffer so that it can
>>> > basically loop on already decoded instructions.  Normally it is only
>>> > something like 64 or 128 bytes in size though.  You might find that
>>> > reducing this loop to that smaller size may improve the performance
>>> > for larger payloads.
>>>
>>> I saw 128 to be better in my testing. For large packets this loop does
>>> all the work. I see performance dependent on the amount of loop
>>> overhead, i.e. we got it down to two non-adcq instructions but it is
>>> still noticeable. Also, this helps a lot on sizes up to 128 bytes
>>> since we only need to do single call in the jump table and no trip
>>> through the loop.
>>
>> But one of your 'loop overhead' instructions is 'loop'.
>> Look at http://www.agner.org/optimize/instruction_tables.pdf
>> you don't want to be using 'loop' on intel cpus.
>>
> I'm not following. We can replace loop with decl %ecx and jg, but why
> is that better?

Because loop takes something like 7 cycles whereas the decl/jg
approach takes 2 or 3.  It is probably one of the reasons things look
so much better with the loop unrolled.

- Alex

Re: [PATCH next] ipvlan: inherit MTU from master device

2016-02-04 Thread David Miller

From: Mahesh Bandewar 
Date: Wed, 27 Jan 2016 23:33:28 -0800

> From: Mahesh Bandewar 
> 
> When we create IPvlan slave; we use ether_setup() and that
> sets up default MTU to 1500 while the master device may have
> lower / different MTU. Any subsequent changes to the masters'
> MTU are reflected into the slaves' MTU setting. However if those
> don't happen (most likely scenario), the slaves' MTU stays at
> 1500 which could be bad.
> 
> This change adds code to inherit MTU from the master device
> instead of using the default value during the link initialization
> phase.
> 
> Signed-off-by: Mahesh Bandewar 

Applied, thanks.

Re: [PATCH v2 1/4] lib: move strtobool to kstrtobool

2016-02-04 Thread Rasmus Villemoes

On Thu, Feb 04 2016, Kees Cook  wrote:

> Create the kstrtobool_from_user helper and moves strtobool logic into
> the new kstrtobool (matching all the other kstrto* functions). Provides
> an inline wrapper for existing strtobool callers.
>
> Signed-off-by: Kees Cook 
> ---
>  include/linux/kernel.h |  3 +++
>  include/linux/string.h |  6 +-
>  lib/kstrtox.c  | 35 +++
>  lib/string.c   | 29 -
>  4 files changed, 43 insertions(+), 30 deletions(-)
>
> diff --git a/include/linux/kernel.h b/include/linux/kernel.h
> index f31638c6e873..cdc25f47a23f 100644
> --- a/include/linux/kernel.h
> +++ b/include/linux/kernel.h
> @@ -357,6 +357,7 @@ int __must_check kstrtou16(const char *s, unsigned int 
> base, u16 *res);
>  int __must_check kstrtos16(const char *s, unsigned int base, s16 *res);
>  int __must_check kstrtou8(const char *s, unsigned int base, u8 *res);
>  int __must_check kstrtos8(const char *s, unsigned int base, s8 *res);
> +int __must_check kstrtobool(const char *s, unsigned int base, bool *res);
>  
>  int __must_check kstrtoull_from_user(const char __user *s, size_t count, 
> unsigned int base, unsigned long long *res);
>  int __must_check kstrtoll_from_user(const char __user *s, size_t count, 
> unsigned int base, long long *res);
> @@ -368,6 +369,8 @@ int __must_check kstrtou16_from_user(const char __user 
> *s, size_t count, unsigne
>  int __must_check kstrtos16_from_user(const char __user *s, size_t count, 
> unsigned int base, s16 *res);
>  int __must_check kstrtou8_from_user(const char __user *s, size_t count, 
> unsigned int base, u8 *res);
>  int __must_check kstrtos8_from_user(const char __user *s, size_t count, 
> unsigned int base, s8 *res);
> +int __must_check kstrtobool_from_user(const char __user *s, size_t count,
> +   unsigned int base, bool *res);
>  
>  static inline int __must_check kstrtou64_from_user(const char __user *s, 
> size_t count, unsigned int base, u64 *res)
>  {
> diff --git a/include/linux/string.h b/include/linux/string.h
> index 9eebc66d957a..d2fb21b1081d 100644
> --- a/include/linux/string.h
> +++ b/include/linux/string.h
> @@ -128,7 +128,11 @@ extern char **argv_split(gfp_t gfp, const char *str, int 
> *argcp);
>  extern void argv_free(char **argv);
>  
>  extern bool sysfs_streq(const char *s1, const char *s2);
> -extern int strtobool(const char *s, bool *res);
> +extern int kstrtobool(const char *s, unsigned int base, bool *res);
> +static inline int strtobool(const char *s, bool *res)
> +{
> + return kstrtobool(s, 0, res);
> +}
>  
>  #ifdef CONFIG_BINARY_PRINTF
>  int vbin_printf(u32 *bin_buf, size_t size, const char *fmt, va_list args);
> diff --git a/lib/kstrtox.c b/lib/kstrtox.c
> index 94be244e8441..e18f088704d7 100644
> --- a/lib/kstrtox.c
> +++ b/lib/kstrtox.c
> @@ -321,6 +321,40 @@ int kstrtos8(const char *s, unsigned int base, s8 *res)
>  }
>  EXPORT_SYMBOL(kstrtos8);
>  
> +/**
> + * kstrtobool - convert common user inputs into boolean values
> + * @s: input string
> + * @base: ignored
> + * @res: result
> + *
> + * This routine returns 0 iff the first character is one of 'Yy1Nn0'.
> + * Otherwise it will return -EINVAL.  Value pointed to by res is
> + * updated upon finding a match.
> + */
> +int kstrtobool(const char *s, unsigned int base, bool *res)
> +{

Being able to create the kstrtobool_from_user with a single macro
invocation is convenient, but I don't think that justifies the ugliness
of having an unused parameter. People reading this code or trying to use
the interface will wonder what it's doing there, and it will generate
slightly larger code for all the users of strtobool.

So I'd just make a separate explicit definition of kstrtobool_from_user
(the stack buffer sizing doesn't apply to the strings we want to parse
anyway, though 11 is of course plenty).

Rasmus

Re: bonding reports interface up with 0 Mbps

2016-02-04 Thread Jay Vosburgh

Jay Vosburgh  wrote:
[...]
>   Thinking about the trace again... Emil: what happens in the
>trace before this?  Is there ever a call to the ixgbe_get_settings?
>Does a NETDEV_UP or NETDEV_CHANGE event ever hit the bond_netdev_event
>function?

Emil kindly sent me the trace offline, and I think I see what's
going on.  It looks like the sequence of events is:

bond_enslave ->
bond_update_speed_duplex (device is down, thus DUPLEX/SPEED_UNKNOWN)
[ do rest of enslavement, start miimon periodic work ]

[ time passes, device goes carrier up ]

ixgbe_service_task: eth1: NIC Link is Up 10 Gbps ->
netif_carrier_on (arranges for NETDEV_CHANGE notifier out of line)

[ a few microseconds later ]

bond_mii_monitor ->
bond_check_dev_link (now is carrier up)
bond_miimon_commit ->   (emits "0 Mbps full duplex" message)
bond_lower_state_changed ->
bond_netdev_event (NETDEV_CHANGELOWERSTATE, is ignored)
bond_3ad_handle_link_change (sees DUPLEX/SPEED_UNKNOWN)

[ a few microseconds later, in response to ixgbe's netif_carrier_on ]

notifier_call_chain ->
bond_netdev_event NETDEV_CHANGE ->
bond_update_speed_duplex (sees correct SPEED_1/FULL) ->
bond_3ad_adapter_speed_duplex_changed (updates 802.3ad)

Basically, the race is that the periodic bond_mii_monitor is
squeezing in between the link going up and bonding's update of the speed
and duplex in response to the NETDEV_CHANGE triggered by the driver's
netif_carrier_on call.  bonding ends up using the stale duplex and speed
information obtained at enslavement time.

I think that, nowadays, the initial speed and duplex will pretty
much always be UNKNOWN, at least for real Ethernet devices, because it
will take longer to autoneg than the time between the dev_open and
bond_update_speed_duplex calls in bond_enslave.

Adding a case to bond_netdev_event for CHANGELOWERSTATE works
because it's a synchronous call from bonding.  For purposes of fixing
this, it's more or less equivalent to calling bond_update_speed_duplex
from bond_miimon_commit (which is part of a test patch I posted earlier
today).

If the above analysis is correct, then I would expect this patch
to make the problem go away:

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 56b560558884..cabaeb61333d 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -2127,6 +2127,7 @@ static void bond_miimon_commit(struct bonding *bond)
continue;

case BOND_LINK_UP:
+   bond_update_speed_duplex(slave);
bond_set_slave_link_state(slave, BOND_LINK_UP,
  BOND_SLAVE_NOTIFY_NOW);
slave->last_link_up = jiffies;

Emil, can you give just the above a test?

I don't see in the trace that there's evidence that ixgbe's link
is rapidly flapping, so I don't think it's necessary to do more than the
above.

Now, separately, bonding really should obey the NETDEV_CHANGE /
NETDEV_UP events instead of polling for carrier state, but if the above
patch works it's a simple fix that is easily backported, which the
CHANGELOWERSTATE method isn't, and the new way (notifier driven) can be
net-next material.

-J

---
-Jay Vosburgh, jay.vosbu...@canonical.com

Re: [PATCH iproute2] ipmonitor: match user option 'all' before 'all-nsid'

2016-02-04 Thread Stephen Hemminger

On Tue,  2 Feb 2016 16:53:40 -0800
Roopa Prabhu  wrote:

> From: Roopa Prabhu 
> 
> 'ip monitor all' is broken on older kernels.
> This patch fixes 'ip monitor all' to match
> 'all' and not 'all-nsid'.
> 
> It moves parsing arg 'all-nsid' to after parsing
> 'all'.
> 
> Before:
> $ip monitor all
> NETLINK_LISTEN_ALL_NSID: Protocol not available
> 
> After:
> $ip monitor all
> [NEIGH]Deleted 10.0.0.1 dev eth1 lladdr c4:54:44:4f:b2:dd STALE
> 
> Fixes: 449b824ad196 ("ipmonitor: allows to monitor in several netns")
> Signed-off-by: Roopa Prabhu 

Applied thanks.

Re: [RESEND PATCH iproute2] tc: fix compilation with old gcc (< 4.6) (bis)

2016-02-04 Thread Stephen Hemminger

On Wed, 3 Feb 2016 08:25:00 +
Nicolas Dichtel  wrote:

> Commit 8f80d450c3cb ("tc: fix compilation with old gcc (< 4.6)") was reverted
> to ease the merge of the net-next branch.
> 
> Here is the new version.
> 
> Signed-off-by: Nicolas Dichtel 
> Signed-off-by: Daniel Borkmann 

Applied

Re: [PATCH iproute2] geneve: add support for lwt tunnel creation and dst port selection

2016-02-04 Thread Stephen Hemminger

On Thu, 28 Jan 2016 14:48:55 +0100
Paolo Abeni  wrote:

> This change add the ability to create lwt/flow based/externally
> controlled geneve device and to select the udp destination port used
> by a full geneve tunnel.
> 
> Signed-off-by: Paolo Abeni 

Applied

Reply: [net] bonding: use return instead of goto

2016-02-04 Thread 张胜举

> On Wed, Feb 03, 2016 at 06:15:22AM +, Zhang Shengju wrote:
> > Replace 'goto' with 'return' to remove unnecessary check at label:
> > err_undo_flags.
> 
> I think you're going to have to explain how you came to the conclusion
that
> the check isn't necessary.
> 
> --
> Jarod Wilson
> ja...@redhat.com
Hi Jarod,

The reason is that 'err_undo_flags' do two things for the first slave
device:
1. revert bond mac address if it is set by the slave device.
2. revert bond device type if it's not ARPHRD_ETHER.

I think it's not necessary for the three places, they changed neither  bond
mac address nor type. 
it's straightforward to return directly.

Thanks,
Shengju

Re: [PATCH net-next 2/2] sfc: implement IPv6 NFC (and IPV4_USER_FLOW)

2016-02-04 Thread Ben Hutchings

On Tue, 2016-02-02 at 18:49 +, Edward Cree wrote:
> Signed-off-by: Edward Cree 
> ---
>  drivers/net/ethernet/sfc/ethtool.c | 176 
> +
>  1 file changed, 176 insertions(+)
> 
> diff --git a/drivers/net/ethernet/sfc/ethtool.c 
> b/drivers/net/ethernet/sfc/ethtool.c
> index 0347976..49fac36 100644
> --- a/drivers/net/ethernet/sfc/ethtool.c
> +++ b/drivers/net/ethernet/sfc/ethtool.c
[...]
>  static int efx_ethtool_get_class_rule(struct efx_nic *efx,
>     struct ethtool_rx_flow_spec *rule)
>  {
[...]
> @@ -855,6 +896,39 @@ static int efx_ethtool_get_class_rule(struct efx_nic 
> *efx,
>   mac_entry->h_proto = spec.ether_type;
>   mac_mask->h_proto = ETHER_TYPE_FULL_MASK;
>   }
> + } else if (spec.match_flags & EFX_FILTER_MATCH_ETHER_TYPE &&
> +    spec.ether_type == htons(ETH_P_IP)) {

Shouldn't this also check that no unhandled match flags are set?

> + rule->flow_type = IPV4_USER_FLOW;
> + uip_entry->ip_ver = ETH_RX_NFC_IP4;
> + if (spec.match_flags & EFX_FILTER_MATCH_IP_PROTO) {
> + uip_mask->proto = IP_PROTO_FULL_MASK;
> + uip_entry->proto = spec.ip_proto;
> + }
> + if (spec.match_flags & EFX_FILTER_MATCH_LOC_HOST) {
> + uip_entry->ip4dst = spec.loc_host[0];
> + uip_mask->ip4dst = IP4_ADDR_FULL_MASK;
> + }
> + if (spec.match_flags & EFX_FILTER_MATCH_REM_HOST) {
> + uip_entry->ip4src = spec.rem_host[0];
> + uip_mask->ip4src = IP4_ADDR_FULL_MASK;
> + }
> + } else if (spec.match_flags & EFX_FILTER_MATCH_ETHER_TYPE &&
> +    spec.ether_type == htons(ETH_P_IPV6)) {

Same here.

[...]
>  static int efx_ethtool_set_class_rule(struct efx_nic *efx,
>   struct ethtool_rx_flow_spec *rule)
>  {
[...]
> +   case IPV6_USER_FLOW:
> +   if (uip6_mask-l4_4_bytes || uip6_mask-tos)
> +   return -EINVAL;
> +   spec.match_flags = EFX_FILTER_MATCH_ETHER_TYPE;
> +   spec.ether_type = htons(ETH_P_IPV6);
> + if (!ip6_mask_is_empty(ip6_mask->ip6dst)) {

This should use uip6_mask not ip6_mask.

> + if (!ip6_mask_is_full(uip6_mask->ip6dst))
> + return -EINVAL;
> + spec.match_flags |= EFX_FILTER_MATCH_LOC_HOST;
> + memcpy(spec.loc_host, uip6_entry->ip6dst, 
> sizeof(spec.loc_host));
> + }
> + if (!ip6_mask_is_empty(ip6_mask->ip6src)) {
[...]

Same here.

Ben.

-- 
Ben Hutchings
It is a miracle that curiosity survives formal education. - Albert Einstein

signature.asc
Description: This is a digitally signed message part

[PATCH v2 net-next 3/4] net: core: introduce neigh_ifdown_all for all down interfaces

2016-02-04 Thread Salam Noureddine

This cleans up neighbour entries for all interfaces in the down
state, avoiding walking the whole neighbour table for each interface
being brought down.

Signed-off-by: Salam Noureddine 
---
 include/net/arp.h   |  1 +
 include/net/neighbour.h |  1 +
 net/core/neighbour.c| 38 +++---
 net/ipv4/arp.c  |  4 
 4 files changed, 37 insertions(+), 7 deletions(-)

diff --git a/include/net/arp.h b/include/net/arp.h
index 5e0f891..0efee66 100644
--- a/include/net/arp.h
+++ b/include/net/arp.h
@@ -43,6 +43,7 @@ void arp_send(int type, int ptype, __be32 dest_ip,
  const unsigned char *src_hw, const unsigned char *th);
 int arp_mc_map(__be32 addr, u8 *haddr, struct net_device *dev, int dir);
 void arp_ifdown(struct net_device *dev);
+void arp_ifdown_all(void);
 
 struct sk_buff *arp_create(int type, int ptype, __be32 dest_ip,
   struct net_device *dev, __be32 src_ip,
diff --git a/include/net/neighbour.h b/include/net/neighbour.h
index 8b68384..8785d7b 100644
--- a/include/net/neighbour.h
+++ b/include/net/neighbour.h
@@ -318,6 +318,7 @@ int neigh_update(struct neighbour *neigh, const u8 *lladdr, 
u8 new, u32 flags);
 void __neigh_set_probe_once(struct neighbour *neigh);
 void neigh_changeaddr(struct neigh_table *tbl, struct net_device *dev);
 int neigh_ifdown(struct neigh_table *tbl, struct net_device *dev);
+int neigh_ifdown_all(struct neigh_table *tbl);
 int neigh_resolve_output(struct neighbour *neigh, struct sk_buff *skb);
 int neigh_connected_output(struct neighbour *neigh, struct sk_buff *skb);
 int neigh_direct_output(struct neighbour *neigh, struct sk_buff *skb);
diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index f18ae91..bfbd97a 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -54,7 +54,8 @@ do {  \
 static void neigh_timer_handler(unsigned long arg);
 static void __neigh_notify(struct neighbour *n, int type, int flags);
 static void neigh_update_notify(struct neighbour *neigh);
-static int pneigh_ifdown(struct neigh_table *tbl, struct net_device *dev);
+static int pneigh_ifdown(struct neigh_table *tbl, struct net_device *dev,
+bool all_down);
 
 #ifdef CONFIG_PROC_FS
 static const struct file_operations neigh_stat_seq_fops;
@@ -192,7 +193,8 @@ static void pneigh_queue_purge(struct sk_buff_head *list)
}
 }
 
-static void neigh_flush_dev(struct neigh_table *tbl, struct net_device *dev)
+static void neigh_flush_dev(struct neigh_table *tbl, struct net_device *dev,
+   bool all_down)
 {
int i;
struct neigh_hash_table *nht;
@@ -210,6 +212,12 @@ static void neigh_flush_dev(struct neigh_table *tbl, 
struct net_device *dev)
np = >next;
continue;
}
+   if (!dev && n->dev && all_down) {
+   if (n->dev->flags & IFF_UP) {
+   np = >next;
+   continue;
+   }
+   }
rcu_assign_pointer(*np,
   rcu_dereference_protected(n->next,
lockdep_is_held(>lock)));
@@ -245,7 +253,7 @@ static void neigh_flush_dev(struct neigh_table *tbl, struct 
net_device *dev)
 void neigh_changeaddr(struct neigh_table *tbl, struct net_device *dev)
 {
write_lock_bh(>lock);
-   neigh_flush_dev(tbl, dev);
+   neigh_flush_dev(tbl, dev, false);
write_unlock_bh(>lock);
 }
 EXPORT_SYMBOL(neigh_changeaddr);
@@ -253,8 +261,8 @@ EXPORT_SYMBOL(neigh_changeaddr);
 int neigh_ifdown(struct neigh_table *tbl, struct net_device *dev)
 {
write_lock_bh(>lock);
-   neigh_flush_dev(tbl, dev);
-   pneigh_ifdown(tbl, dev);
+   neigh_flush_dev(tbl, dev, false);
+   pneigh_ifdown(tbl, dev, false);
write_unlock_bh(>lock);
 
del_timer_sync(>proxy_timer);
@@ -263,6 +271,19 @@ int neigh_ifdown(struct neigh_table *tbl, struct 
net_device *dev)
 }
 EXPORT_SYMBOL(neigh_ifdown);
 
+int neigh_ifdown_all(struct neigh_table *tbl)
+{
+   write_lock_bh(>lock);
+   neigh_flush_dev(tbl, NULL, true);
+   pneigh_ifdown(tbl, NULL, true);
+   write_unlock_bh(>lock);
+
+   del_timer_sync(>proxy_timer);
+   pneigh_queue_purge(>proxy_queue);
+   return 0;
+}
+EXPORT_SYMBOL(neigh_ifdown_all);
+
 static struct neighbour *neigh_alloc(struct neigh_table *tbl, struct 
net_device *dev)
 {
struct neighbour *n = NULL;
@@ -645,7 +666,8 @@ int pneigh_delete(struct neigh_table *tbl, struct net *net, 
const void *pkey,
return -ENOENT;
 }
 
-static int pneigh_ifdown(struct neigh_table *tbl, struct net_device *dev)
+static int pneigh_ifdown(struct neigh_table *tbl, struct net_device *dev,
+

Re: [PATCH v3 net-next] net: Implement fast csum_partial for x86_64

2016-02-04 Thread Linus Torvalds

On Thu, Feb 4, 2016 at 5:27 PM, Linus Torvalds
 wrote:
> sum = csum_partial_lt8(*(unsigned long *)buff, len, sum);
> return rotate_by8_if_odd(sum, align);

Actually, that last word-sized access to "buff" might be past the end
of the buffer. The code does the right thing if "len" is zero, except
for the possible page fault or address verifier complaint.

So that very last call to "csum_partial_lt8()" either needs to be
conditional (easy enough to add an "if (len)" around that whole
statement) or the thing could be unconditional but the load needs to
use "load_unaligned_zeropad()" so that the exception is harmless.

It's probably immaterial which one you pick. The few possibly useless
ALU operations vs a possible branch misprodict penalty are probably
going to make it a wash. The exception will never happen in practice,
but if DEBUG_PAGEALLOC is enabled, or if something like KASAN is
active, it will complain loudly if it happens to go past the
allocation.

   Linus

Re: [PATCH v3 net-next] net: Implement fast csum_partial for x86_64

2016-02-04 Thread Linus Torvalds

On Thu, Feb 4, 2016 at 2:09 PM, Linus Torvalds
 wrote:
>
> The "+" should be "-", of course - the point is to shift up the value
> by 8 bits for odd cases, and we need to load starting one byte early
> for that. The idea is that we use the byte shifter in the load unit to
> do some work for us.

Ok, so I thought some more about this, and the fact is, we don't
actually want to do the byte shifting at all for the first case (the
"length < 8" case), since the address of that one hasn't been shifted.

it's only for the "we're going to align things to 8 bytes" case that
we would want to do it. But then we might as well use the
rotate_by8_if_odd() model, so I suspect the address games are just
entirely pointless.

So here is something that is actually tested (although admittedly not
well), and uses that fairly simple model.

NOTE! I did not do the unrolling of the "adcq" loop in the middle, but
that's a totally trivial thing now. So this isn't very optimized,
because it will do a *lot* of extra "adcq $0" to get rid of the carry
bit. But with that core loop unrolled, you'd get rid of most of them.

  Linus

---
static unsigned long rotate_by8_if_odd(unsigned long sum, unsigned long aligned)
{
asm("rorq %b1,%0"
:"=r" (sum)
:"c" ((aligned & 1) << 3), "0" (sum));
return sum;
}

static unsigned long csum_partial_lt8(unsigned long val, int len,
unsigned long sum)
{
unsigned long mask = (1ul << len*8)-1;
val &= mask;
return add64_with_carry(val, sum);
}

static unsigned long csum_partial_64(const void *buff, unsigned long
len, unsigned long sum)
{
unsigned long align, val;

// This is the only potentially unaligned access, and it can
// also theoretically overflow into the next page
val = load_unaligned_zeropad(buff);
if (len < 8)
return csum_partial_lt8(val, len, sum);

align = 7 & -(unsigned long)buff;
sum = csum_partial_lt8(val, align, sum);
buff += align;
len -= align;

sum = rotate_by8_if_odd(sum, align);
while (len >= 8) {
val = *(unsigned long *) buff;
sum = add64_with_carry(sum, val);
buff += 8;
len -= 8;
}
sum = csum_partial_lt8(*(unsigned long *)buff, len, sum);
return rotate_by8_if_odd(sum, align);
}

__wsum csum_partial(const void *buff, unsigned long len, unsigned long sum)
{
sum = csum_partial_64(buff, len, sum);
return add32_with_carry(sum, sum >> 32);
}

Re: [PATCH net-next 1/2] ethtool: add IPv6 to the NFC API

2016-02-04 Thread Ben Hutchings

On Tue, 2016-02-02 at 18:49 +, Edward Cree wrote:
> Signed-off-by: Edward Cree 
> ---
>  include/uapi/linux/ethtool.h | 70 
> 
>  1 file changed, 64 insertions(+), 6 deletions(-)
> 
> diff --git a/include/uapi/linux/ethtool.h b/include/uapi/linux/ethtool.h
> index 57fa390..3b6af3e 100644
> --- a/include/uapi/linux/ethtool.h
> +++ b/include/uapi/linux/ethtool.h
> @@ -748,6 +748,56 @@ struct ethtool_usrip4_spec {
>   __u8proto;
>  };
>  
> +/**
> + * struct ethtool_tcpip6_spec - flow specification for TCP/IPv6 etc.
> + * @ip6src: Source host
> + * @ip6dst: Destination host
> + * @psrc: Source port
> + * @pdst: Destination port
> + * @tos: Type-of-service
> + *
> + * This can be used to specify a TCP/IPv6, UDP/IPv6 or SCTP/IPv6 flow.
> + */
> +struct ethtool_tcpip6_spec {
> + __be32  ip6src[4];
> + __be32  ip6dst[4];
> + __be16  psrc;
> + __be16  pdst;
> + __u8tos;
[...]

IPv6 has 'Traffic Class' instead of 'Type of Service'.  At least the
kernel-doc comments should use the proper name, and perhaps you should
rename the 'tos' fields to something like 'tclass'.  (Definitely not
just 'class' as UAPI headers have to be C++ compatible.)

Ben.

-- 
Ben Hutchings
It is a miracle that curiosity survives formal education. - Albert Einstein

signature.asc
Description: This is a digitally signed message part

Reply: [PATCH iproute2] ip-link: remove warning message

2016-02-04 Thread 张胜举

> On Thu, 21 Jan 2016 02:23:49 +
> Zhang Shengju  wrote:
> 
> > the warning was:
> > iproute.c:301:12: warning: 'val' may be used uninitialized in this
> > function [-Wmaybe-uninitialized]
> >features &= ~RTAX_FEATURE_ECN;
> > ^
> > iproute.c:575:10: note: 'val' was declared here
> >__u32 val;
> >   ^
> >
> > Signed-off-by: Zhang Shengju 
> > ---
> >  ip/iproute.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/ip/iproute.c b/ip/iproute.c index d5e3ebe..afe70e1 100644
> > --- a/ip/iproute.c
> > +++ b/ip/iproute.c
> > @@ -572,7 +572,7 @@ int print_route(const struct sockaddr_nl *who,
> struct nlmsghdr *n, void *arg)
> > mxlock =
> *(unsigned*)RTA_DATA(mxrta[RTAX_LOCK]);
> >
> > for (i=2; i<= RTAX_MAX; i++) {
> > -   __u32 val;
> > +   __u32 val = 0U;
> >
> > if (mxrta[i] == NULL)
> > continue;
> 
> Your compiler is doing bad dependency analysis.
> There is not really a bug here.
> 
> It would still be best to initialize to keep broken compilers from causing
> warning.

Yes, it's not really a bug here. This variable is set by all means.  This
patch just want to remove the warning. 

Thanks,
Shengju

Re: [PATCH v2 4/4] param: convert some "on"/"off" users to strtobool

2016-02-04 Thread Kees Cook

On Thu, Feb 4, 2016 at 4:11 PM, Kees Cook  wrote:
> On Thu, Feb 4, 2016 at 3:04 PM, Andy Shevchenko
>  wrote:
>> On Thu, Feb 4, 2016 at 11:00 PM, Kees Cook  wrote:
>>> This changes several users of manual "on"/"off" parsing to use strtobool.
>>> (Which means they will now parse y/n/1/0 meaningfully too.)
>>>
>>
>> I like this change, but can you carefully check the acceptance of the
>> returned value?
>> Briefly I saw 1 or 0 as okay in different places.
>
> Maybe I missed something, but I think this is actually a bug fix. The
> two cases are early_param and __setup:
>
> For early_param, the functions are called when walking the command
> line in do_early_param via parse_args in parse_early_options. Any
> non-zero return values produce a warning (in do_early_param not
> parse_args). So this is a bug fix, since the function I touched would
> (almost) always return 0, even with bad values (i.e. fixes unreported
> bad arguments):
>
> early_param early_parse_stp always 0
> early_param early_parse_topology always 0
> early_param parse_gart_mem always 0 unless !p (then -EINVAL)
>
> For __setup, these are handled by obsolete_checksetup via
> unknown_bootoption via parse_args in start_kernel, as a way to merge
> __setup calls that should really be in param (i.e. non-early __setup).
> Return values are bubbled up into parse_args and hit:
>
> default:
> pr_err("%s: `%s' invalid for parameter `%s'\n",
>doing, val ?: "", param);
> break;
>
> So this is also a bug fix, since these __setup functions returned inverted
> values or always failed:
>
> __setup rtasmsgs_setup always 1
> __setup setup_cede_offline 1 on success, otherwise 0
> __setup setup_hrtimer_hres 1 on success, otherwise 0
> __setup setup_tick_nohz 1 on success, otherwise 0
>
> So if you specified any of these, they would trigger a bogus "invalid
> parameter" report.
>
> I will double-check...

I am wrong! __setup functions (as handled by unknown_bootoption) need
to return 1, or they end up in the init environment. I will send a
fix...

-Kees

>
> -Kees
>
>>
>>
>>> Signed-off-by: Kees Cook 
>>> Acked-by: Heiko Carstens 
>>> Acked-by: Michael Ellerman 
>>> Cc: x...@kernel.org
>>> Cc: linuxppc-...@lists.ozlabs.org
>>> Cc: linux-s...@vger.kernel.org
>>> ---
>>>  arch/powerpc/kernel/rtasd.c  |  9 ++---
>>>  arch/powerpc/platforms/pseries/hotplug-cpu.c | 10 ++
>>>  arch/s390/kernel/time.c  |  8 ++--
>>>  arch/s390/kernel/topology.c  |  7 ++-
>>>  arch/x86/kernel/aperture_64.c| 12 ++--
>>>  include/linux/tick.h |  2 +-
>>>  kernel/time/hrtimer.c| 10 ++
>>>  kernel/time/tick-sched.c | 10 ++
>>>  8 files changed, 15 insertions(+), 53 deletions(-)
>>>
>>> diff --git a/arch/powerpc/kernel/rtasd.c b/arch/powerpc/kernel/rtasd.c
>>> index 5a2c049c1c61..567ed5a2f43a 100644
>>> --- a/arch/powerpc/kernel/rtasd.c
>>> +++ b/arch/powerpc/kernel/rtasd.c
>>> @@ -49,7 +49,7 @@ static unsigned int rtas_error_log_buffer_max;
>>>  static unsigned int event_scan;
>>>  static unsigned int rtas_event_scan_rate;
>>>
>>> -static int full_rtas_msgs = 0;
>>> +static bool full_rtas_msgs;
>>>
>>>  /* Stop logging to nvram after first fatal error */
>>>  static int logging_enabled; /* Until we initialize everything,
>>> @@ -592,11 +592,6 @@ __setup("surveillance=", surveillance_setup);
>>>
>>>  static int __init rtasmsgs_setup(char *str)
>>>  {
>>> -   if (strcmp(str, "on") == 0)
>>> -   full_rtas_msgs = 1;
>>> -   else if (strcmp(str, "off") == 0)
>>> -   full_rtas_msgs = 0;
>>> -
>>> -   return 1;
>>> +   return kstrtobool(str, 0, _rtas_msgs);
>>>  }
>>>  __setup("rtasmsgs=", rtasmsgs_setup);
>>> diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c 
>>> b/arch/powerpc/platforms/pseries/hotplug-cpu.c
>>> index 32274f72fe3f..b9787cae4108 100644
>>> --- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
>>> +++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
>>> @@ -47,20 +47,14 @@ static DEFINE_PER_CPU(enum cpu_state_vals, 
>>> current_state) = CPU_STATE_OFFLINE;
>>>
>>>  static enum cpu_state_vals default_offline_state = CPU_STATE_OFFLINE;
>>>
>>> -static int cede_offline_enabled __read_mostly = 1;
>>> +static bool cede_offline_enabled __read_mostly = true;
>>>
>>>  /*
>>>   * Enable/disable cede_offline when available.
>>>   */
>>>  static int __init setup_cede_offline(char *str)
>>>  {
>>> -   if (!strcmp(str, "off"))
>>> -   cede_offline_enabled = 0;
>>> -   else if (!strcmp(str, "on"))
>>> -   cede_offline_enabled = 1;
>>> -   else
>>> -   return 0;
>>> -   return 1;
>>> +

feature - ip link set primary ?

2016-02-04 Thread James Feeney

Should there be an

 ip link set  primary 

command in the iproute2 package, to set the Primary Slave on a "bond" type link?

It seems that the alternative now is to use the sysfs, with

 echo -n  > /sys/devices/virtual/net//bonding/primary

which, in systemd Service Unit Files, requires "/usr/bin/sh -c 'echo ...'"
rather than simply "/usr/bin/echo ...".  "ip" seems to handle all other "bond"
configuration itself.

Thanks
James

[PATCH v2 net-next 1/4] net: add event_list to struct net and provide utility functions

2016-02-04 Thread Salam Noureddine


Signed-off-by: Salam Noureddine 
---
 include/net/net_namespace.h | 22 ++
 net/core/net_namespace.c|  1 +
 2 files changed, 23 insertions(+)

diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index 4089abc..6dbc0b2 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -58,6 +58,7 @@ struct net {
struct list_headlist;   /* list of network namespaces */
struct list_headcleanup_list;   /* namespaces on death row */
struct list_headexit_list;  /* Use only net_mutex */
+   struct list_headevent_list; /* net_device notifier list */
 
struct user_namespace   *user_ns;   /* Owning user namespace */
spinlock_t  nsid_lock;
@@ -380,4 +381,25 @@ static inline void fnhe_genid_bump(struct net *net)
atomic_inc(>fnhe_genid);
 }
 
+#ifdef CONFIG_NET_NS
+static inline void net_add_event_list(struct list_head *head, struct net *net)
+{
+   if (list_empty(>event_list))
+   list_add_tail(>event_list, head);
+}
+
+static inline void net_del_event_list(struct net *net)
+{
+   list_del_init(>event_list);
+}
+#else
+static inline void net_add_event_list(struct list_head *head, struct net *net)
+{
+}
+
+static inline void net_del_event_list(struct net *net)
+{
+}
+#endif
+
 #endif /* __NET_NET_NAMESPACE_H */
diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index 2c2eb1b..58e84ce 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -282,6 +282,7 @@ static __net_init int setup_net(struct net *net, struct 
user_namespace *user_ns)
net->user_ns = user_ns;
idr_init(>netns_ids);
spin_lock_init(>nsid_lock);
+   INIT_LIST_HEAD(>event_list);
 
list_for_each_entry(ops, _list, list) {
error = ops_init(ops, net);
-- 
1.8.1.4

[PATCH v2 net-next 2/4] net: dev: add batching to net_device notifiers

2016-02-04 Thread Salam Noureddine

This can be used to optimize bringing down and unregsitering
net_devices by running certain cleanup operations only on the
net namespace instead of on each net_device.

Signed-off-by: Salam Noureddine 
---
 include/linux/netdevice.h |  2 ++
 net/core/dev.c| 48 ++-
 2 files changed, 45 insertions(+), 5 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index c20b814..1b12269 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2183,6 +2183,8 @@ struct netdev_lag_lower_state_info {
 #define NETDEV_BONDING_INFO0x0019
 #define NETDEV_PRECHANGEUPPER  0x001A
 #define NETDEV_CHANGELOWERSTATE0x001B
+#define NETDEV_UNREGISTER_BATCH0x001C
+#define NETDEV_DOWN_BATCH  0x001D
 
 int register_netdevice_notifier(struct notifier_block *nb);
 int unregister_netdevice_notifier(struct notifier_block *nb);
diff --git a/net/core/dev.c b/net/core/dev.c
index 914b4a2..dbd8995 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1439,6 +1439,8 @@ static int __dev_close(struct net_device *dev)
 int dev_close_many(struct list_head *head, bool unlink)
 {
struct net_device *dev, *tmp;
+   struct net *net, *net_tmp;
+   LIST_HEAD(net_head);
 
/* Remove the devices that don't need to be closed */
list_for_each_entry_safe(dev, tmp, head, close_list)
@@ -1447,13 +1449,22 @@ int dev_close_many(struct list_head *head, bool unlink)
 
__dev_close_many(head);
 
-   list_for_each_entry_safe(dev, tmp, head, close_list) {
+   list_for_each_entry(dev, head, close_list) {
rtmsg_ifinfo(RTM_NEWLINK, dev, IFF_UP|IFF_RUNNING, GFP_KERNEL);
call_netdevice_notifiers(NETDEV_DOWN, dev);
+   }
+
+   list_for_each_entry_safe(dev, tmp, head, close_list) {
+   net_add_event_list(_head, dev_net(dev));
if (unlink)
list_del_init(>close_list);
}
 
+   list_for_each_entry_safe(net, net_tmp, _head, event_list) {
+   call_netdevice_notifiers(NETDEV_DOWN_BATCH, net->loopback_dev);
+   net_del_event_list(net);
+   }
+
return 0;
 }
 EXPORT_SYMBOL(dev_close_many);
@@ -1572,12 +1583,17 @@ rollback:
call_netdevice_notifier(nb, NETDEV_GOING_DOWN,
dev);
call_netdevice_notifier(nb, NETDEV_DOWN, dev);
+   call_netdevice_notifier(nb, NETDEV_DOWN_BATCH,
+   dev);
}
call_netdevice_notifier(nb, NETDEV_UNREGISTER, dev);
}
+   call_netdevice_notifier(nb, NETDEV_UNREGISTER_BATCH,
+   net->loopback_dev);
}
 
 outroll:
+   call_netdevice_notifier(nb, NETDEV_UNREGISTER_BATCH, last);
raw_notifier_chain_unregister(_chain, nb);
goto unlock;
 }
@@ -1614,9 +1630,13 @@ int unregister_netdevice_notifier(struct notifier_block 
*nb)
call_netdevice_notifier(nb, NETDEV_GOING_DOWN,
dev);
call_netdevice_notifier(nb, NETDEV_DOWN, dev);
+   call_netdevice_notifier(nb, NETDEV_DOWN_BATCH,
+   dev);
}
call_netdevice_notifier(nb, NETDEV_UNREGISTER, dev);
}
+   call_netdevice_notifier(nb, NETDEV_UNREGISTER_BATCH,
+   net->loopback_dev);
}
 unlock:
rtnl_unlock();
@@ -6187,10 +6207,12 @@ void __dev_notify_flags(struct net_device *dev, 
unsigned int old_flags,
rtmsg_ifinfo(RTM_NEWLINK, dev, gchanges, GFP_ATOMIC);
 
if (changes & IFF_UP) {
-   if (dev->flags & IFF_UP)
+   if (dev->flags & IFF_UP) {
call_netdevice_notifiers(NETDEV_UP, dev);
-   else
+   } else {
call_netdevice_notifiers(NETDEV_DOWN, dev);
+   call_netdevice_notifiers(NETDEV_DOWN_BATCH, dev);
+   }
}
 
if (dev->flags & IFF_UP &&
@@ -6427,7 +6449,9 @@ static void net_set_todo(struct net_device *dev)
 static void rollback_registered_many(struct list_head *head)
 {
struct net_device *dev, *tmp;
+   struct net *net, *net_tmp;
LIST_HEAD(close_head);
+   LIST_HEAD(net_head);
 
BUG_ON(dev_boot_phase);
ASSERT_RTNL();
@@ -6465,8 +6489,6 @@ static void rollback_registered_many(struct list_head 
*head)
synchronize_net();
 
list_for_each_entry(dev, head, unreg_list) {
-   struct sk_buff *skb = NULL;
-
/* Shutdown

[PATCH v2 net-next 4/4] net: fib: avoid calling fib_flush for each device when doing batch close and unregister

2016-02-04 Thread Salam Noureddine

Call fib_flush at the end when closing or unregistering multiple
devices. This can save walking the fib many times and greatly
reduce rtnl_lock hold time when unregistering many devices with
a fib having hundreds of thousands of routes.

Signed-off-by: Salam Noureddine 
---
 include/net/netns/ipv4.h |  1 +
 net/ipv4/fib_frontend.c  | 16 ++--
 2 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h
index d75be32..d59a078 100644
--- a/include/net/netns/ipv4.h
+++ b/include/net/netns/ipv4.h
@@ -111,5 +111,6 @@ struct netns_ipv4 {
 #endif
 #endif
atomic_trt_genid;
+   boolneeds_fib_flush;
 };
 #endif
diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index 4734475..808426e 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -1161,11 +1161,22 @@ static int fib_netdev_event(struct notifier_block 
*this, unsigned long event, vo
unsigned int flags;
 
if (event == NETDEV_UNREGISTER) {
-   fib_disable_ip(dev, event, true);
+   if (fib_sync_down_dev(dev, event, true))
+   net->ipv4.needs_fib_flush = true;
rt_flush_dev(dev);
return NOTIFY_DONE;
}
 
+   if (event == NETDEV_UNREGISTER_BATCH || event == NETDEV_DOWN_BATCH) {
+   if (net->ipv4.needs_fib_flush) {
+   fib_flush(net);
+   net->ipv4.needs_fib_flush = false;
+   }
+   rt_cache_flush(net);
+   arp_ifdown_all();
+   return NOTIFY_DONE;
+   }
+
in_dev = __in_dev_get_rtnl(dev);
if (!in_dev)
return NOTIFY_DONE;
@@ -1182,7 +1193,8 @@ static int fib_netdev_event(struct notifier_block *this, 
unsigned long event, vo
rt_cache_flush(net);
break;
case NETDEV_DOWN:
-   fib_disable_ip(dev, event, false);
+   if (fib_sync_down_dev(dev, event, false))
+   net->ipv4.needs_fib_flush = true;
break;
case NETDEV_CHANGE:
flags = dev_get_flags(dev);
-- 
1.8.1.4

[PATCH v2 net-next 0/4] batch calls to fib_flush and arp_ifdown

2016-02-04 Thread Salam Noureddine

Added changes suggested by Julian Anastasov in version 2.

fib_flush walks the whole fib in a net_namespace and is called for
each net_device being closed or unregistered. This can be very expensive
when dealing with 100k or more routes in the fib and removal of a lot
of interfaces. These four patches deal with this issue by calling fib_flush
just once for each net namespace and introduce a new function arp_ifdown_all
that does a similar optimization for the neighbour table.

The benchmark tests were run on linux-3.18.

Salam Noureddine (4):
  net: add event_list to struct net and provide utility functions
  net: dev: add batching to net_device notifiers
  net: core: introduce neigh_ifdown_all for all down interfaces
  net: fib: avoid calling fib_flush for each device when doing batch
close and unregister

 include/linux/netdevice.h   |  2 ++
 include/net/arp.h   |  1 +
 include/net/neighbour.h |  1 +
 include/net/net_namespace.h | 22 +
 include/net/netns/ipv4.h|  1 +
 net/core/dev.c  | 48 -
 net/core/neighbour.c| 38 ---
 net/core/net_namespace.c|  1 +
 net/ipv4/arp.c  |  4 
 net/ipv4/fib_frontend.c | 16 +--
 10 files changed, 120 insertions(+), 14 deletions(-)

-- 
1.8.1.4

RE: [net] igbvf: remove "link is Up" message when registering mcast address

2016-02-04 Thread Brown, Aaron F

> From: netdev-ow...@vger.kernel.org [netdev-ow...@vger.kernel.org] on behalf 
> of Jon Maxwell [jmaxwel...@gmail.com]
> Sent: Sunday, January 24, 2016 3:22 PM
> To: Kirsher, Jeffrey T
> Cc: da...@davemloft.net; jmaxw...@redhat.com; vinsc...@redhat.com; 
> intel-wired-...@lists.osuosl.org; netdev@vger.kernel.org; 
> linux-ker...@vger.kernel.org; Jon Maxwell
> Subject: [net] igbvf: remove "link is Up" message when registering mcast 
> address
> 
> A similar issue was addressed a few years ago in the following thread:
> 
> http://www.spinics.net/lists/netdev/msg245877.html
> 
> At that time there were concerns that removing this statement may cause other
> side effects. However the submitter addressed those concerns. But the dialogue
> went cold. We have a new case where a customers application is registering and
> un-registering multicast addresses every few seconds. This is leading to many
> "Link is Up" messages in the logs as a result of the
> "netif_carrier_off(netdev)" statement called by igbvf_msix_other(). Also on
> some kernels it is interfering with the bonding driver causing it to failover
> and subsequently affecting connectivity.
> 
> The Sourgeforge driver does not make this call and is therefore not affected.
> If there were any side effects I would expect that driver to also be affected.
> I have tested re-loading the igbvf driver and downing the adapter with the PF
> entity on the host where the VM has this patch. When I bring it back up again
> connectivity is restored as expected. Therefore I request that this patch gets
> submitted.
> 
> Signed-off-by: Jon Maxwell 
> ---
>  drivers/net/ethernet/intel/igbvf/netdev.c | 1 -
>  1 file changed, 1 deletion(-)
> 

Tested-by: Aaron Brown

Re: [PATCH v2 4/4] param: convert some "on"/"off" users to strtobool

2016-02-04 Thread Kees Cook

On Thu, Feb 4, 2016 at 3:04 PM, Andy Shevchenko
 wrote:
> On Thu, Feb 4, 2016 at 11:00 PM, Kees Cook  wrote:
>> This changes several users of manual "on"/"off" parsing to use strtobool.
>> (Which means they will now parse y/n/1/0 meaningfully too.)
>>
>
> I like this change, but can you carefully check the acceptance of the
> returned value?
> Briefly I saw 1 or 0 as okay in different places.

Maybe I missed something, but I think this is actually a bug fix. The
two cases are early_param and __setup:

For early_param, the functions are called when walking the command
line in do_early_param via parse_args in parse_early_options. Any
non-zero return values produce a warning (in do_early_param not
parse_args). So this is a bug fix, since the function I touched would
(almost) always return 0, even with bad values (i.e. fixes unreported
bad arguments):

early_param early_parse_stp always 0
early_param early_parse_topology always 0
early_param parse_gart_mem always 0 unless !p (then -EINVAL)

For __setup, these are handled by obsolete_checksetup via
unknown_bootoption via parse_args in start_kernel, as a way to merge
__setup calls that should really be in param (i.e. non-early __setup).
Return values are bubbled up into parse_args and hit:

default:
pr_err("%s: `%s' invalid for parameter `%s'\n",
   doing, val ?: "", param);
break;

So this is also a bug fix, since these __setup functions returned inverted
values or always failed:

__setup rtasmsgs_setup always 1
__setup setup_cede_offline 1 on success, otherwise 0
__setup setup_hrtimer_hres 1 on success, otherwise 0
__setup setup_tick_nohz 1 on success, otherwise 0

So if you specified any of these, they would trigger a bogus "invalid
parameter" report.

I will double-check...

-Kees

>
>
>> Signed-off-by: Kees Cook 
>> Acked-by: Heiko Carstens 
>> Acked-by: Michael Ellerman 
>> Cc: x...@kernel.org
>> Cc: linuxppc-...@lists.ozlabs.org
>> Cc: linux-s...@vger.kernel.org
>> ---
>>  arch/powerpc/kernel/rtasd.c  |  9 ++---
>>  arch/powerpc/platforms/pseries/hotplug-cpu.c | 10 ++
>>  arch/s390/kernel/time.c  |  8 ++--
>>  arch/s390/kernel/topology.c  |  7 ++-
>>  arch/x86/kernel/aperture_64.c| 12 ++--
>>  include/linux/tick.h |  2 +-
>>  kernel/time/hrtimer.c| 10 ++
>>  kernel/time/tick-sched.c | 10 ++
>>  8 files changed, 15 insertions(+), 53 deletions(-)
>>
>> diff --git a/arch/powerpc/kernel/rtasd.c b/arch/powerpc/kernel/rtasd.c
>> index 5a2c049c1c61..567ed5a2f43a 100644
>> --- a/arch/powerpc/kernel/rtasd.c
>> +++ b/arch/powerpc/kernel/rtasd.c
>> @@ -49,7 +49,7 @@ static unsigned int rtas_error_log_buffer_max;
>>  static unsigned int event_scan;
>>  static unsigned int rtas_event_scan_rate;
>>
>> -static int full_rtas_msgs = 0;
>> +static bool full_rtas_msgs;
>>
>>  /* Stop logging to nvram after first fatal error */
>>  static int logging_enabled; /* Until we initialize everything,
>> @@ -592,11 +592,6 @@ __setup("surveillance=", surveillance_setup);
>>
>>  static int __init rtasmsgs_setup(char *str)
>>  {
>> -   if (strcmp(str, "on") == 0)
>> -   full_rtas_msgs = 1;
>> -   else if (strcmp(str, "off") == 0)
>> -   full_rtas_msgs = 0;
>> -
>> -   return 1;
>> +   return kstrtobool(str, 0, _rtas_msgs);
>>  }
>>  __setup("rtasmsgs=", rtasmsgs_setup);
>> diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c 
>> b/arch/powerpc/platforms/pseries/hotplug-cpu.c
>> index 32274f72fe3f..b9787cae4108 100644
>> --- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
>> +++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
>> @@ -47,20 +47,14 @@ static DEFINE_PER_CPU(enum cpu_state_vals, 
>> current_state) = CPU_STATE_OFFLINE;
>>
>>  static enum cpu_state_vals default_offline_state = CPU_STATE_OFFLINE;
>>
>> -static int cede_offline_enabled __read_mostly = 1;
>> +static bool cede_offline_enabled __read_mostly = true;
>>
>>  /*
>>   * Enable/disable cede_offline when available.
>>   */
>>  static int __init setup_cede_offline(char *str)
>>  {
>> -   if (!strcmp(str, "off"))
>> -   cede_offline_enabled = 0;
>> -   else if (!strcmp(str, "on"))
>> -   cede_offline_enabled = 1;
>> -   else
>> -   return 0;
>> -   return 1;
>> +   return kstrtobool(str, 0, _offline_enabled);
>>  }
>>
>>  __setup("cede_offline=", setup_cede_offline);
>> diff --git a/arch/s390/kernel/time.c b/arch/s390/kernel/time.c
>> index 99f84ac31307..dff6ce1b84b2 100644
>> --- a/arch/s390/kernel/time.c
>> +++ b/arch/s390/kernel/time.c
>> @@ -1433,7 +1433,7 @@ device_initcall(etr_init_sysfs);
>>  /*
>>   * Server Time Protocol (STP)

Re: [PATCH 1/2] ethtool: add dynamic flag to ETHTOOL_{GS}RXFH commands

2016-02-04 Thread David Miller

From: "Keller, Jacob E" 
Date: Thu, 4 Feb 2016 23:09:56 +

> So you're suggesting instead, to error when the second operation
> (change number of queues) would fail the current settings?

Yes.

This is absolutely required.

RE: bonding reports interface up with 0 Mbps

2016-02-04 Thread Tantilov, Emil S

>-Original Message-
>From: Jay Vosburgh [mailto:jay.vosbu...@canonical.com]
>Sent: Thursday, February 04, 2016 4:37 PM
>To: Tantilov, Emil S
>Cc: netdev@vger.kernel.org; go...@cumulusnetworks.com; zhuyj;
>j...@mellanox.com
>Subject: Re: bonding reports interface up with 0 Mbps
>
>Jay Vosburgh  wrote:
>[...]
>>  Thinking about the trace again... Emil: what happens in the
>>trace before this?  Is there ever a call to the ixgbe_get_settings?
>>Does a NETDEV_UP or NETDEV_CHANGE event ever hit the bond_netdev_event
>>function?
>
>   Emil kindly sent me the trace offline, and I think I see what's
>going on.  It looks like the sequence of events is:
>
>bond_enslave ->
>   bond_update_speed_duplex (device is down, thus DUPLEX/SPEED_UNKNOWN)
>   [ do rest of enslavement, start miimon periodic work ]
>
>   [ time passes, device goes carrier up ]
>
>ixgbe_service_task: eth1: NIC Link is Up 10 Gbps ->
>   netif_carrier_on (arranges for NETDEV_CHANGE notifier out of line)
>
>   [ a few microseconds later ]
>
>bond_mii_monitor ->
>   bond_check_dev_link (now is carrier up)
>   bond_miimon_commit ->   (emits "0 Mbps full duplex" message)
>   bond_lower_state_changed ->
>   bond_netdev_event (NETDEV_CHANGELOWERSTATE, is ignored)
>   bond_3ad_handle_link_change (sees DUPLEX/SPEED_UNKNOWN)
>
>   [ a few microseconds later, in response to ixgbe's netif_carrier_on ]
>
>notifier_call_chain ->
>   bond_netdev_event NETDEV_CHANGE ->
>   bond_update_speed_duplex (sees correct SPEED_1/FULL) ->
>   bond_3ad_adapter_speed_duplex_changed (updates 802.3ad)
>
>   Basically, the race is that the periodic bond_mii_monitor is
>squeezing in between the link going up and bonding's update of the speed
>and duplex in response to the NETDEV_CHANGE triggered by the driver's
>netif_carrier_on call.  bonding ends up using the stale duplex and speed
>information obtained at enslavement time.
>
>   I think that, nowadays, the initial speed and duplex will pretty
>much always be UNKNOWN, at least for real Ethernet devices, because it
>will take longer to autoneg than the time between the dev_open and
>bond_update_speed_duplex calls in bond_enslave.
>
>   Adding a case to bond_netdev_event for CHANGELOWERSTATE works
>because it's a synchronous call from bonding.  For purposes of fixing
>this, it's more or less equivalent to calling bond_update_speed_duplex
>from bond_miimon_commit (which is part of a test patch I posted earlier
>today).
>
>   If the above analysis is correct, then I would expect this patch
>to make the problem go away:
>
>diff --git a/drivers/net/bonding/bond_main.c
>b/drivers/net/bonding/bond_main.c
>index 56b560558884..cabaeb61333d 100644
>--- a/drivers/net/bonding/bond_main.c
>+++ b/drivers/net/bonding/bond_main.c
>@@ -2127,6 +2127,7 @@ static void bond_miimon_commit(struct bonding *bond)
>   continue;
>
>   case BOND_LINK_UP:
>+  bond_update_speed_duplex(slave);
>   bond_set_slave_link_state(slave, BOND_LINK_UP,
> BOND_SLAVE_NOTIFY_NOW);
>   slave->last_link_up = jiffies;
>
>
>   Emil, can you give just the above a test?

Sure I'll fire it up.

Thanks,
Emil

Re: bonding reports interface up with 0 Mbps

2016-02-04 Thread zhuyj


On 02/05/2016 08:37 AM, Jay Vosburgh wrote:

Jay Vosburgh  wrote:
[...]

Thinking about the trace again... Emil: what happens in the
trace before this?  Is there ever a call to the ixgbe_get_settings?
Does a NETDEV_UP or NETDEV_CHANGE event ever hit the bond_netdev_event
function?

Emil kindly sent me the trace offline, and I think I see what's
going on.  It looks like the sequence of events is:

bond_enslave ->
bond_update_speed_duplex (device is down, thus DUPLEX/SPEED_UNKNOWN)
[ do rest of enslavement, start miimon periodic work ]

[ time passes, device goes carrier up ]

ixgbe_service_task: eth1: NIC Link is Up 10 Gbps ->
netif_carrier_on (arranges for NETDEV_CHANGE notifier out of line)

[ a few microseconds later ]

bond_mii_monitor ->
bond_check_dev_link (now is carrier up)
bond_miimon_commit ->(emits "0 Mbps full duplex" message)
bond_lower_state_changed ->
bond_netdev_event (NETDEV_CHANGELOWERSTATE, is ignored)
bond_3ad_handle_link_change (sees DUPLEX/SPEED_UNKNOWN)

[ a few microseconds later, in response to ixgbe's netif_carrier_on ]

notifier_call_chain ->
bond_netdev_event NETDEV_CHANGE ->
bond_update_speed_duplex (sees correct SPEED_1/FULL) ->
bond_3ad_adapter_speed_duplex_changed (updates 802.3ad)

Basically, the race is that the periodic bond_mii_monitor is
squeezing in between the link going up and bonding's update of the speed
and duplex in response to the NETDEV_CHANGE triggered by the driver's
netif_carrier_on call.  bonding ends up using the stale duplex and speed
information obtained at enslavement time.

I think that, nowadays, the initial speed and duplex will pretty
much always be UNKNOWN, at least for real Ethernet devices, because it
will take longer to autoneg than the time between the dev_open and
bond_update_speed_duplex calls in bond_enslave.

Adding a case to bond_netdev_event for CHANGELOWERSTATE works
because it's a synchronous call from bonding.  For purposes of fixing
this, it's more or less equivalent to calling bond_update_speed_duplex
from bond_miimon_commit (which is part of a test patch I posted earlier
today).

If the above analysis is correct, then I would expect this patch
to make the problem go away:

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 56b560558884..cabaeb61333d 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -2127,6 +2127,7 @@ static void bond_miimon_commit(struct bonding *bond)
continue;
  
  		case BOND_LINK_UP:

+   bond_update_speed_duplex(slave);
bond_set_slave_link_state(slave, BOND_LINK_UP,
  BOND_SLAVE_NOTIFY_NOW);
slave->last_link_up = jiffies;


Emil, can you give just the above a test?

I don't see in the trace that there's evidence that ixgbe's link
is rapidly flapping, so I don't think it's necessary to do more than the
above.


Sure. I agree with you. I expect this can solve this problem.

Thanks a lot.
Zhu Yanjun



Now, separately, bonding really should obey the NETDEV_CHANGE /
NETDEV_UP events instead of polling for carrier state, but if the above
patch works it's a simple fix that is easily backported, which the
CHANGELOWERSTATE method isn't, and the new way (notifier driven) can be
net-next material.

-J

---
-Jay Vosburgh, jay.vosbu...@canonical.com

1 2 >

1 - 100 of 180 matches

Mail list logo