Re: [PATCH v2] notifier: Fix soft lockup for notifier_call_chain().

2016-06-30 Thread Eric Dumazet
On Fri, 2016-07-01 at 11:06 +0800, Ding Tianhong wrote:

> I debug this problem, and found that the __fib6_clean_all() would not
> hold the cpu more than 1 second event though there
> is a lot of ipv6 address to deal with, but the notifier_chian would
> call the ipv6 notifier several times and hold the cpu
> for a long time, so add cond_resched() in the addrconf_ifdown could
> solve the problem correctly, I think your first solution
> is the good way to fix this bug.

I am traveling these days, so please send an official patch once you've
tested it, thanks !





Re: [PATCH net] net: poll tx timeout only on active tx queues

2016-06-30 Thread Eric Dumazet
On Fri, 2016-07-01 at 04:50 +, Yuval Mintz wrote:
> > currently all the device driver call  netif_tx_start_all_queues(dev)
> > on open to W/A this issue. which is strange since only
> > real_num_tx_queues are active.
> 
> You could also argue that netif_tx_start_all_queues() should
> only enable the real_num_tx_queues.
> [Although that would obviously cause all drivers to reach the
> 'problem' you're currently fixing].

Yep. Basically what I pointed out.

It seems inconsistent to have loops using num_tx_queues, and others
using real_num_tx_queues.

Instead of 'fixing' one of them, we should take a deeper look, even if
the change looks fine.

num_tx_queues should be used in code that runs once, like
netdev_lockdep_set_classes(), but other loops should probably use
real_num_tx_queues.

Anyway all these changes should definitely target net-next, not net
tree.





Re: [PATCH net] net: poll tx timeout only on active tx queues

2016-06-30 Thread Yuval Mintz
> currently all the device driver call  netif_tx_start_all_queues(dev)
> on open to W/A this issue. which is strange since only
> real_num_tx_queues are active.

You could also argue that netif_tx_start_all_queues() should
only enable the real_num_tx_queues.
[Although that would obviously cause all drivers to reach the
'problem' you're currently fixing].


[BUG] Panic on boot in ixgbe with Xeon D-1531

2016-06-30 Thread Patrick McLean
Hi,

We are getting a panic on boot with Linus git as of this morning. I
have attached the boot log, it looks like the panic is in igbvf/ixgbe.
The machine is being netbooted via legacy PXE.

I have attached the full boot log from a kernel with igbvf enabled,
and one log of just the panic message with igbvf disabled. Please let
me know if you need any more information.
[0.00] Command line: BOOT_IMAGE=vmlinuz 
X
[0.00] x86/fpu: xstate_offset[2]:  576, xstate_sizes[2]:  256
[0.00] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point 
registers'
[0.00] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
[0.00] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
[0.00] x86/fpu: Enabled xstate features 0x7, context size is 832 bytes, 
using 'standard' format.
[0.00] x86/fpu: Using 'eager' FPU context switches.
[0.00] e820: BIOS-provided physical RAM map:
[0.00] BIOS-e820: [mem 0x-0x00099bff] usable
[0.00] BIOS-e820: [mem 0x00099c00-0x0009] reserved
[0.00] BIOS-e820: [mem 0x000e-0x000f] reserved
[0.00] BIOS-e820: [mem 0x0010-0x79978fff] usable
[0.00] BIOS-e820: [mem 0x79979000-0x79b04fff] reserved
[0.00] BIOS-e820: [mem 0x79b05000-0x79bf0fff] usable
[0.00] BIOS-e820: [mem 0x79bf1000-0x7a0e8fff] ACPI NVS
[0.00] BIOS-e820: [mem 0x7a0e9000-0x7bd96fff] reserved
[0.00] BIOS-e820: [mem 0x7bd97000-0x7bd97fff] usable
[0.00] BIOS-e820: [mem 0x7bd98000-0x7be1dfff] reserved
[0.00] BIOS-e820: [mem 0x7be1e000-0x7bff] usable
[0.00] BIOS-e820: [mem 0x7c00-0x8fff] reserved
[0.00] BIOS-e820: [mem 0xfed1c000-0xfed1] reserved
[0.00] BIOS-e820: [mem 0xff00-0x] reserved
[0.00] BIOS-e820: [mem 0x0001-0x00107fff] usable
[0.00] NX (Execute Disable) protection: active
[0.00] SMBIOS 3.0 present.
[0.00] e820: last_pfn = 0x1080 000 max_arch_pf0 00
[0.tion [0-7]: WB  WC  UC- UC  WB  WC  UC- WT  
[0.00] e820: last_pfn = 0x7c000 max_arch_pfn = 0x4
[0.00] Scanning 1 areas for low memory corruption
[0.00] Using GB pages for direct mapping
[0.00] RAMDISK: [mem 0x78b1c000-0x79978fff]
[0.00] ACPI: Early table checksum verification disabled
[0.00] ACPI: RSDP 0x000F05B0 24 (v02 ALASKA)
[0.00] ACPI: XSDT 0x79C3D0A8 D4 (v01 ALASKA A M I
01072009 AMI  00010013)
[0.00] ACPI: FACP 0x79C6E038 00010C (v05 ALASKA A M I
01072009 AMI  00010013)
[0.00] ACPI: DSDT 0x79C3D218 030E1B (v02 ALASKA A M I
01072009 INTL 20091013)
[0.00] ACPI: FACS 0x7A0E7F80 40
[0.00] ACPI: APIC 0x79C6E148 000100 (v03 ALASKA A M I
01072009 AMI  00010013)
[0.00] ACPI: FPDT 0x79C6E248 44 (v01 ALASKA A M I
01072009 AMI  00010013)
[0.00] ACPI: FIDT 0x79C6E290 9C (v01 ALASKA A M I
01072009 AMI  00010013)
[0.00] ACPI: SPMI 0x79C6E330 41 (v05 ALASKA A M I
 AMI. )
[0.00] ACPI: MCFG 0x79C6E378 3C (v01 ALASKA A M I
01072009 MSFT 0097)
[0.00] ACPI: UEFI 0x79C6E3B8 42 (v01 
  )
[0.00] ACPI: DBG2 0x79C6E400 72 (v00 ALASKA A M I
 INTL 20091013)
[0.00] ACPI: HPET 0x79C6E478 38 (v01 ALASKA A M I
0001 INTL 20091013) CPI: MSCT 0x79C6E4B0 90 (v01 ALASKA A M I   
 0001 INTL 20091013)
[0.00] ACPI: SLIT 0x79C6E540 2D (v01 ALASKA A M I
0001 INTL 20091013)
[0.00] ACPI: SRAT 0x79C6E570 001158 (v03 ALASKA A M I
0001 INTL 20091013)
[0.00] ACPI: WDDT 0x79C6F6C8 40 (v01 ALASKA A M I
 INTL 20091013)
[0.00] ACPI: SSDT 0x79C6F708 00EDE7 (v01 AMIPmMgt
0001 INTL 20120913)
[0.00] ACPI: SSDT 0x79C7E4F0 0020CB (v02 ALASKA SpsNm
0002 INTL 20120913)
[0.00] ACPI: SSDT 0x79C805C0 64 (v02 ALASKA SpsNvs   
0002 INTL 20120913)
[0.00] ACPI: PRAD 0x79C80628 000102 (v02 ALASKA A M I
0002 INTL 20120913)
[0.00] ACPI: DMAR 0x79C80730 BC (v01 ALASKA A M I
0001 INTL 

Re: [PATCH net-next V2 08/16] net/devlink: Add E-Switch mode control

2016-06-30 Thread Or Gerlitz
On Fri, Jul 1, 2016 at 3:45 AM, John Fastabend  wrote:
> On 16-06-30 08:23 AM, Saeed Mahameed wrote:
>> From: Or Gerlitz 
>>
>> Add the commands to set and show the mode of SRIOV E-Switch, two modes
>> are supported:
>>
>> * legacy: operating in the "old" L2 based mode (DMAC --> VF vport)
>>
>> * switchdev: the E-Switch is referred to as whitebox switch configured
>> using standard tools such as tc, bridge, openvswitch etc. To allow
>> working with the tools, for each VF, a VF representor netdevice is
>> created by the E-Switch manager vendor device driver instance (e.g PF).

> OK I can't come up with a better name and Jiri/Or convinced me this
> should work ok so this works for me.

cool.

> One question though going forward. We have devices with multiple
> "switches" in them how does this work in a devlink environment? Do
> we need some way to enumerate the switches and identify them. In
> which case this attribute would be a global setting.

Devices which expose single PCI function for managing multiple switches?

AFAIK the driver for this HW is not upstream yet, there's no real
legacy around them. Since we agree the new mode is the way to go,
global setting should be fine here, I think.

Or.


Re: [PATCHv2 net-next 1/3] net: Add provision to specify pf number while assigning VF mac

2016-06-30 Thread Hariprasad Shenai
On Thu, Jun 30, 2016 at 19:04:16 +, Yuval Mintz wrote:
> > Chelsio T4/T5 cards have SR-IOV Capabilities on Physical Functions
> > 0..3 and the administrative Driver(cxgb4) attaches to Physical Function 4.
> > Each of the Physical Functions 0..3 can support up to 16 Virtual
> > Functions. With the current Linux APIs, a 2-Port card would only be
> > able to use the Virtual Functions on Physical Functions 0..1 and not
> > allow the Virtual Functions on Physical Functions 2..3 to be used since
> > there are no Ports 2..3 on a 2-Port card.
> >
> > Also the current ip commands takes netdev as one of the argument, and
> > it assumes a 1-to-1 mapping of Network Ports, Physical Functions and the
> > SR-IOV Virtual Functions of those Physical Functions. But it is not
> > true in our case and won't work for us.
> > 
> > Added a new argument to specify the PF number associated with the VF, to
> > fix this.
> 
> I don't get it - what's the exact definition of 'Physical Function'?
> Are we talking PCI functions? Logical partitons? Something else?

Its PCIe physical function. Physical functions (PFs) are full-featured
PCIe functions; virtual functions (VFs) are “lightweight” functions that
lack configuration resource.


Re: [PATCH v2] notifier: Fix soft lockup for notifier_call_chain().

2016-06-30 Thread Ding Tianhong
On 2016/6/28 14:27, Eric Dumazet wrote:
> On Tue, 2016-06-28 at 08:22 +0200, Eric Dumazet wrote:
> 
>> Follow the stack trace and add another cond_resched() where it is needed
>> then ?
>>
>> Lot of this code was written decade ago where nobody expected a root
>> user was going to try hard to crash its host ;)
>>
>> I did not check if the following is valid (Maybe __fib6_clean_all() is
>> called with some spinlock/rwlock held)
> 
> Well, fib6_run_gc() can call it with
> spin_lock_bh(>ipv6.fib6_gc_lock) so this wont work.
> 
> We need more invasive changes.
> 
> 
> 
Hi Eric:

I debug this problem, and found that the __fib6_clean_all() would not hold the 
cpu more than 1 second event though there
is a lot of ipv6 address to deal with, but the notifier_chian would call the 
ipv6 notifier several times and hold the cpu
for a long time, so add cond_resched() in the addrconf_ifdown could solve the 
problem correctly, I think your first solution
is the good way to fix this bug.

Thanks
Ding



Re: [PATCH 6/7] dt-bindings: net: bgmac: add bindings documentation for bgmac

2016-06-30 Thread Rob Herring
On Thu, Jun 30, 2016 at 06:59:13PM -0400, Jon Mason wrote:
> Signed-off-by: Jon Mason 
> ---
>  .../devicetree/bindings/net/brcm,bgmac-nsp.txt | 24 
> ++
>  1 file changed, 24 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/net/brcm,bgmac-nsp.txt
> 
> diff --git a/Documentation/devicetree/bindings/net/brcm,bgmac-nsp.txt 
> b/Documentation/devicetree/bindings/net/brcm,bgmac-nsp.txt
> new file mode 100644
> index 000..022946c
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/net/brcm,bgmac-nsp.txt
> @@ -0,0 +1,24 @@
> +Broadcom GMAC Ethernet Controller Device Tree Bindings
> +-
> +
> +Required properties:
> + - compatible:   "brcm,bgmac-nsp"

Usually we do - order.

> + - reg:  Address and length of the GMAC registers,
> + Address and length of the GMAC IDM registers
> + - reg-names:Names of the registers.  Must have both "gmac_base" and
> + "idm_base"
> + - interrupts:   Interrupt number
> +
> +Optional properties:
> +- mac-address:   See ethernet.txt file in the same directory
> +
> +Examples:
> +
> +gmac0: ethernet@18022000 {
> + compatible = "brcm,bgmac-nsp";
> + reg = <0x18022000 0x1000>,
> +   <0x1811 0x1000>;
> + reg-names = "gmac_base", "idm_base";
> + interrupts = ;
> + status = "disabled";
> +};
> -- 
> 1.9.1
> 


Re: [PATCH v2 net-next 1/9] MAINTAINERS: add maintainers for hns driver

2016-06-30 Thread Daode Huang



On 2016/6/30 18:48, Joe Perches wrote:

On Thu, 2016-06-30 at 15:25 +0800, Yisen Zhuang wrote:

From: Daode Huang 

This patch adds maintainers for hisilicon network subsystem driver

[]

diff --git a/MAINTAINERS b/MAINTAINERS

[]

@@ -5421,6 +5421,15 @@ F:   include/uapi/linux/if_hippi.h
  F:net/802/hippi.c
  F:drivers/net/hippi/
  
+HISILICON NETWORK SUBSYSTEM DRIVER

+M: Yisen Zhuang 
+M: Salil Mehta 
+L: netdev@vger.kernel.org
+W: http://www.hisilicon.com
+S: Maintained
+F: drivers/net/ethernet/hisilicon/*

>From MAINTAINERS:

F: Files and directories with wildcard patterns.
   A trailing slash includes all files and subdirectory files.
   F:   drivers/net/all files in and below drivers/net
   F:   drivers/net/*   all files in drivers/net, but not below
   F:   */net/* all files in "any top level directory"/net
   One pattern per line.  Multiple F: lines acceptable.

So this pattern with the trailing /* matches only:

drivers/net/ethernet/hisilicon/Kconfig
drivers/net/ethernet/hisilicon/Makefile
drivers/net/ethernet/hisilicon/hip04_eth.c
drivers/net/ethernet/hisilicon/hix5hd2_gmac.c

and not any file in drivers/net/ethernet/hisilicon/hns/

Is that what you want?


We will maintain all files in hisilicon and any directory below it.
will fix it in next submit.

Thanks


using

F:  drivers/net/ethernet/hisilicon/

matches all files in hisilicon and any directory below it.



.






Re: [PATCH V6 1/1] net: ethernet: Add TSE PCS support to dwmac-socfpga

2016-06-30 Thread Rob Herring
On Wed, Jun 29, 2016 at 02:10:13AM -0700, th...@altera.com wrote:
> From: Tien Hock Loh 
> 
> This adds support for TSE PCS that uses SGMII adapter when the phy-mode of
> the dwmac is set to sgmii.
> 
> Signed-off-by: Tien Hock Loh 
> 
> ---
> v2:
> - Refactored the TSE PCS out from the dwmac-socfpga.c file
> - Added binding documentation for TSE PCS sgmii adapter
> v3:
> - Added missing license header for new source files
> - Updated tse_pcs.h include headers
> - Standardize if statements
> v4:
> - Reset SGMII adapter on speed change
> - Do not enable SGMII adapter if speed is not supported
> - On init, if PCS reset fails, do not enable adapter
> v5:
> - Fixed devicetree binding property name using _ instead of -
> v6:
> - Fixed a problem where driver build broken if driver is set as module
> ---
>  .../devicetree/bindings/net/socfpga-dwmac.txt  |  19 ++

Acked-by: Rob Herring 

>  drivers/net/ethernet/stmicro/stmmac/Makefile   |   3 +-
>  drivers/net/ethernet/stmicro/stmmac/altr_tse_pcs.c | 276 
> +
>  drivers/net/ethernet/stmicro/stmmac/altr_tse_pcs.h |  36 +++
>  .../net/ethernet/stmicro/stmmac/dwmac-socfpga.c| 141 +--
>  5 files changed, 453 insertions(+), 22 deletions(-)
>  create mode 100644 drivers/net/ethernet/stmicro/stmmac/altr_tse_pcs.c
>  create mode 100644 drivers/net/ethernet/stmicro/stmmac/altr_tse_pcs.h


RE: [PATCH 1/1] qlcnic: add wmb() call in transmit data path.

2016-06-30 Thread Sony Chacko
Subject: Re: [PATCH 1/1] qlcnic: add wmb() call in transmit data path.

>
>> +/* Ensure writes are complete before HW fetches Tx descriptors */
>> +wmb();
>>   qlcnic_update_cmd_producer(tx_ring);
>>
>>   return NETDEV_TX_OK;
>>
>
> Would not an mmiowb be more appropriate in this case?
>
> Regards,
> Lino

Sorry, this was nonsense.This should be "dma_wmb" not "mmiowb".

Lino,

The patch is based on this kernel documentation. 

https://www.kernel.org/doc/Documentation/memory-barriers.txt

/* force memory to sync before notifying device via MMIO */
wmb();

/* notify device of new descriptors */
writel(DESC_NOTIFY, doorbell);
}
The wmb() is needed to guarantee that the cache coherent memory writes have 
completed before attempting a write to the cache incoherent MMIO region.

Thanks,
Sony


Re: [PATCH v2 net] bonding: prevent out of bound accesses

2016-06-30 Thread Ding Tianhong
On 2016/6/30 22:13, Eric Dumazet wrote:
> From: Eric Dumazet 
> 
> ether_addr_equal_64bits() requires some care about its arguments,
> namely that 8 bytes might be read, even if last 2 byte values are not
> used.
> 
> KASan detected a violation with null_mac_addr and lacpdu_mcast_addr
> in bond_3ad.c
> 
> Same problem with mac_bcast[] and mac_v6_allmcast[] in bond_alb.c :
> Although the 8-byte alignment was there, KASan would detect out
> of bound accesses.
> 
> Fixes: 815117adaf5b ("bonding: use ether_addr_equal_unaligned for bond addr 
> compare")
> Fixes: bb54e58929f3 ("bonding: Verify RX LACPDU has proper dest mac-addr")
> Fixes: 885a136c52a8 ("bonding: use compare_ether_addr_64bits() in ALB")
> Signed-off-by: Eric Dumazet 
> Reported-by: Dmitry Vyukov 
> ---
>  drivers/net/bonding/bond_3ad.c |   11 +++
>  drivers/net/bonding/bond_alb.c |7 ++-
>  include/net/bonding.h  |7 ++-
>  3 files changed, 15 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
> index ca81f46ea1aa..8ad491ab1d01 100644
> --- a/drivers/net/bonding/bond_3ad.c
> +++ b/drivers/net/bonding/bond_3ad.c
> @@ -101,11 +101,14 @@ enum ad_link_speed_type {
>  #define MAC_ADDRESS_EQUAL(A, B)  \
>   ether_addr_equal_64bits((const u8 *)A, (const u8 *)B)
>  
> -static struct mac_addr null_mac_addr = { { 0, 0, 0, 0, 0, 0 } };
> +static const u8 null_mac_addr[ETH_ALEN + 2] __long_aligned = {
> + 0, 0, 0, 0, 0, 0
> +};
>  static u16 ad_ticks_per_sec;
>  static const int ad_delta_in_ticks = (AD_TIMER_INTERVAL * HZ) / 1000;
>  
> -static const u8 lacpdu_mcast_addr[ETH_ALEN] = MULTICAST_LACPDU_ADDR;
> +static const u8 lacpdu_mcast_addr[ETH_ALEN + 2] __long_aligned =
> + MULTICAST_LACPDU_ADDR;
>  
>  /* = main 802.3ad protocol functions == */
>  static int ad_lacpdu_send(struct port *port);
> @@ -1739,7 +1742,7 @@ static void ad_clear_agg(struct aggregator *aggregator)
>   aggregator->is_individual = false;
>   aggregator->actor_admin_aggregator_key = 0;
>   aggregator->actor_oper_aggregator_key = 0;
> - aggregator->partner_system = null_mac_addr;
> + eth_zero_addr(aggregator->partner_system.mac_addr_value);
>   aggregator->partner_system_priority = 0;
>   aggregator->partner_oper_aggregator_key = 0;
>   aggregator->receive_state = 0;
> @@ -1761,7 +1764,7 @@ static void ad_initialize_agg(struct aggregator 
> *aggregator)
>   if (aggregator) {
>   ad_clear_agg(aggregator);
>  
> - aggregator->aggregator_mac_address = null_mac_addr;
> + 
> eth_zero_addr(aggregator->aggregator_mac_address.mac_addr_value);
>   aggregator->aggregator_identifier = 0;
>   aggregator->slave = NULL;
>   }
> diff --git a/drivers/net/bonding/bond_alb.c b/drivers/net/bonding/bond_alb.c
> index c5ac160a8ae9..551f0f8dead3 100644
> --- a/drivers/net/bonding/bond_alb.c
> +++ b/drivers/net/bonding/bond_alb.c
> @@ -42,13 +42,10 @@
>  
>  
>  
> -#ifndef __long_aligned
> -#define __long_aligned __attribute__((aligned((sizeof(long)
> -#endif
> -static const u8 mac_bcast[ETH_ALEN] __long_aligned = {
> +static const u8 mac_bcast[ETH_ALEN + 2] __long_aligned = {
>   0xff, 0xff, 0xff, 0xff, 0xff, 0xff
>  };
> -static const u8 mac_v6_allmcast[ETH_ALEN] __long_aligned = {
> +static const u8 mac_v6_allmcast[ETH_ALEN + 2] __long_aligned = {
>   0x33, 0x33, 0x00, 0x00, 0x00, 0x01
>  };
>  static const int alb_delta_in_ticks = HZ / ALB_TIMER_TICKS_PER_SEC;
> diff --git a/include/net/bonding.h b/include/net/bonding.h
> index 791800ddd6d9..6360c259da6d 100644
> --- a/include/net/bonding.h
> +++ b/include/net/bonding.h
> @@ -34,6 +34,9 @@
>  
>  #define BOND_DEFAULT_MIIMON  100
>  
> +#ifndef __long_aligned
> +#define __long_aligned __attribute__((aligned((sizeof(long)
> +#endif
>  /*
>   * Less bad way to call ioctl from within the kernel; this needs to be
>   * done some other way to get the call out of interrupt context.
> @@ -138,7 +141,9 @@ struct bond_params {
>   struct reciprocal_value reciprocal_packets_per_slave;
>   u16 ad_actor_sys_prio;
>   u16 ad_user_port_key;
> - u8 ad_actor_system[ETH_ALEN];
> +
> + /* 2 bytes of padding : see ether_addr_equal_64bits() */
> + u8 ad_actor_system[ETH_ALEN + 2];
>  };
>  
>  struct bond_parm_tbl {
> 
> 
> 
> .
It is fine to me.

Acked-by: Ding Tianhong 
> 




Re: [PATCH net-next V2 08/16] net/devlink: Add E-Switch mode control

2016-06-30 Thread John Fastabend
On 16-06-30 08:23 AM, Saeed Mahameed wrote:
> From: Or Gerlitz 
> 
> Add the commands to set and show the mode of SRIOV E-Switch, two modes
> are supported:
> 
> * legacy: operating in the "old" L2 based mode (DMAC --> VF vport)
> 
> * switchdev: the E-Switch is referred to as whitebox switch configured
> using standard tools such as tc, bridge, openvswitch etc. To allow
> working with the tools, for each VF, a VF representor netdevice is
> created by the E-Switch manager vendor device driver instance (e.g PF).
> 
> Signed-off-by: Or Gerlitz 
> Signed-off-by: Saeed Mahameed 
> ---


OK I can't come up with a better name and Jiri/Or convinced me this
should work ok so this works for me.

One question though going forward. We have devices with multiple
"switches" in them how does this work in a devlink environment? Do
we need some way to enumerate the switches and identify them. In
which case this attribute would be a global setting.

Thanks,
John





Re: [PATCH iproute2] bridge: man: fix STP LISTENING description

2016-06-30 Thread Stephen Hemminger
On Wed, 29 Jun 2016 19:26:29 +
Vivien Didelot  wrote:

> Correct the unclear and poorly conjugated STP LISTENING documentation.
> 
> Signed-off-by: Vivien Didelot 

Applied


Re: [PATCH] ip route: timeout for routes has to be set in seconds

2016-06-30 Thread Stephen Hemminger
On Tue, 28 Jun 2016 23:27:14 +
Andrey Vagin  wrote:

> From: Andrew Vagin 
> 
> Currently a timeout is multiplied by HZ in user-space and
> then it multiplied by HZ in kernel-space.
> 
> $ ./ip/ip r add 2002::0/64 dev veth1 expires 10
> $ ./ip/ip -6 r
> 2002::/64 dev veth1  metric 1024 linkdown  expires 996sec pref medium
> 
> Cc: Xin Long 
> Cc: Hangbin Liu 
> Cc: Stephen Hemminger 
> Fixes: 68eede250500 ("route: allow routes to be configured with expire 
> values")
> Signed-off-by: Andrew Vagin 

Applied.



Re: [PATCH iproute2] bridge: man: fix BPUD typo

2016-06-30 Thread Stephen Hemminger
On Wed, 29 Jun 2016 19:26:10 +
Vivien Didelot  wrote:

> s/BPUD/BPDU/ in guard description.
> 
> Signed-off-by: Vivien Didelot 

Applied


Re: [PATCHv2 net-next 1/3] net: Add provision to specify pf number while assigning VF mac

2016-06-30 Thread Yuval Mintz
> Chelsio T4/T5 cards have SR-IOV Capabilities on Physical Functions
> 0..3 and the administrative Driver(cxgb4) attaches to Physical Function 4.
> Each of the Physical Functions 0..3 can support up to 16 Virtual
> Functions. With the current Linux APIs, a 2-Port card would only be
> able to use the Virtual Functions on Physical Functions 0..1 and not
> allow the Virtual Functions on Physical Functions 2..3 to be used since
> there are no Ports 2..3 on a 2-Port card.
>
> Also the current ip commands takes netdev as one of the argument, and
> it assumes a 1-to-1 mapping of Network Ports, Physical Functions and the
> SR-IOV Virtual Functions of those Physical Functions. But it is not
> true in our case and won't work for us.
> 
> Added a new argument to specify the PF number associated with the VF, to
> fix this.

I don't get it - what's the exact definition of 'Physical Function'?
Are we talking PCI functions? Logical partitons? Something else?


Re: [PATCH iproute2 net-next] bridge: vlan: add support to display per-vlan statistics

2016-06-30 Thread Stephen Hemminger
On Tue, 21 Jun 2016 18:11:59 +0200
Nikolay Aleksandrov  wrote:

> >> Thanks, this is a useful tool, but I think the formatting of output may 
> >> need to be
> >> reworked.  The bridge tool works similar to ip command. And in the ip 
> >> command the
> >> -s flag causes additional lines, but does not change the output format.  
> > 
> > Indeed, I agree that it needs refinement.
> >   
> 
> Or alternatively I can make it:
> $ bridge vlan stats
> a subcommand instead of using the "-s" argument in order to be consistent.
> So it can have its own format.

Why not:

$ bridge -s vlan show

to be consistent with:

$ ip -s li show


Re: [Patch net] net_sched: fix mirrored packets checksum

2016-06-30 Thread Cong Wang
On Thu, Jun 30, 2016 at 4:26 PM, Cong Wang  wrote:
> On Thu, Jun 30, 2016 at 4:11 PM, Daniel Borkmann  wrote:
>> On 07/01/2016 12:42 AM, Cong Wang wrote:
>>>
>>> On Thu, Jun 30, 2016 at 12:50 PM, Daniel Borkmann 
>>> wrote:


 Maybe makes sense to move skb_push_rcsum() but /also/ skb_pull_rcsum()
 to the header then? Both seem similarly small at least (could be split
 f.e into two patches then, first for the move, second for the actual
 fix).
>>>
>>>
>>> No objection from me. Please feel free to send a patch. ;)
>>
>>
>> Shrug, I actually meant this as feedback to your patch, since you move that
>> helper and not as a note to myself. ;)
>
> Interesting, my patch only moves what it needs, why does it need
> to do more?

In case you miss the context:
http://marc.info/?l=linux-netdev=146730654005424=2

This patch should be backported to stable too, which is another
reason why we should keep it as small as possible.

Here, at Twitter, we already backported it to 4.1 kernel for testing.

(The reason why I don't have a Fixes: tag is that I don't identify an
offending commit to blame yet.)


Re: [Patch net] net_sched: fix mirrored packets checksum

2016-06-30 Thread Cong Wang
On Thu, Jun 30, 2016 at 4:11 PM, Daniel Borkmann  wrote:
> On 07/01/2016 12:42 AM, Cong Wang wrote:
>>
>> On Thu, Jun 30, 2016 at 12:50 PM, Daniel Borkmann 
>> wrote:
>>>
>>>
>>> Maybe makes sense to move skb_push_rcsum() but /also/ skb_pull_rcsum()
>>> to the header then? Both seem similarly small at least (could be split
>>> f.e into two patches then, first for the move, second for the actual
>>> fix).
>>
>>
>> No objection from me. Please feel free to send a patch. ;)
>
>
> Shrug, I actually meant this as feedback to your patch, since you move that
> helper and not as a note to myself. ;)

Interesting, my patch only moves what it needs, why does it need
to do more?

Again, I am not against your idea, just 1) it doesn't belong to my patch
2) I am too lazy to create a patch for it, or, I am perfectly fine with not
moving it too ;)


Re: [PATCH 5/7] net: ethernet: bgmac: Add platform device support

2016-06-30 Thread Florian Fainelli
[snip]

+
> + return 0;
> +
> +err2:
> + devm_iounmap(>dev, bgmac->plat.idm_base);
> +err1:
> + devm_iounmap(>dev, bgmac->plat.base);
> +err:
> + devm_kfree(>dev, bgmac);


This is not needed actually, now that you use the device managed helper
functions.

> +
> + return rc;
> +}
> +
> +static int bgmac_remove(struct platform_device *pdev)
> +{
> + struct bgmac *bgmac = platform_get_drvdata(pdev);
> +
> + bgmac_enet_remove(bgmac);
> + devm_iounmap(>dev, bgmac->plat.idm_base);
> + devm_iounmap(>dev, bgmac->plat.base);
> + devm_kfree(>dev, bgmac);

Same here.
-- 
Florian


[PATCH net-next 5/9] RDS: TCP: make ->sk_user_data point to a rds_conn_path

2016-06-30 Thread Sowmini Varadhan
The socket callbacks should all operate on a struct rds_conn_path,
in preparation for a MP capable RDS-TCP.

Acked-by: Santosh Shilimkar 
Signed-off-by: Sowmini Varadhan 
---
 net/rds/tcp.c |   25 +
 net/rds/tcp.h |4 ++--
 net/rds/tcp_connect.c |   16 
 net/rds/tcp_listen.c  |   12 ++--
 net/rds/tcp_recv.c|   12 ++--
 net/rds/tcp_send.c|   12 ++--
 6 files changed, 41 insertions(+), 40 deletions(-)

diff --git a/net/rds/tcp.c b/net/rds/tcp.c
index b327727..5658f3e 100644
--- a/net/rds/tcp.c
+++ b/net/rds/tcp.c
@@ -136,9 +136,9 @@ void rds_tcp_restore_callbacks(struct socket *sock,
  * from being called while it isn't set.
  */
 void rds_tcp_reset_callbacks(struct socket *sock,
-struct rds_connection *conn)
+struct rds_conn_path *cp)
 {
-   struct rds_tcp_connection *tc = conn->c_transport_data;
+   struct rds_tcp_connection *tc = cp->cp_transport_data;
struct socket *osock = tc->t_sock;
 
if (!osock)
@@ -148,8 +148,8 @@ void rds_tcp_reset_callbacks(struct socket *sock,
 * We have an outstanding SYN to this peer, which may
 * potentially have transitioned to the RDS_CONN_UP state,
 * so we must quiesce any send threads before resetting
-* c_transport_data. We quiesce these threads by setting
-* c_state to something other than RDS_CONN_UP, and then
+* cp_transport_data. We quiesce these threads by setting
+* cp_state to something other than RDS_CONN_UP, and then
 * waiting for any existing threads in rds_send_xmit to
 * complete release_in_xmit(). (Subsequent threads entering
 * rds_send_xmit() will bail on !rds_conn_up().
@@ -164,8 +164,8 @@ void rds_tcp_reset_callbacks(struct socket *sock,
 * RDS_CONN_RESETTTING, to ensure that rds_tcp_state_change
 * cannot mark rds_conn_path_up() in the window before lock_sock()
 */
-   atomic_set(>c_state, RDS_CONN_RESETTING);
-   wait_event(conn->c_waitq, !test_bit(RDS_IN_XMIT, >c_flags));
+   atomic_set(>cp_state, RDS_CONN_RESETTING);
+   wait_event(cp->cp_waitq, !test_bit(RDS_IN_XMIT, >cp_flags));
lock_sock(osock->sk);
/* reset receive side state for rds_tcp_data_recv() for osock  */
if (tc->t_tinc) {
@@ -186,11 +186,12 @@ void rds_tcp_reset_callbacks(struct socket *sock,
release_sock(osock->sk);
sock_release(osock);
 newsock:
-   rds_send_path_reset(>c_path[0]);
+   rds_send_path_reset(cp);
lock_sock(sock->sk);
write_lock_bh(>sk->sk_callback_lock);
tc->t_sock = sock;
-   sock->sk->sk_user_data = conn;
+   tc->t_cpath = cp;
+   sock->sk->sk_user_data = cp;
sock->sk->sk_data_ready = rds_tcp_data_ready;
sock->sk->sk_write_space = rds_tcp_write_space;
sock->sk->sk_state_change = rds_tcp_state_change;
@@ -203,9 +204,9 @@ void rds_tcp_reset_callbacks(struct socket *sock,
  * above rds_tcp_reset_callbacks for notes about synchronization
  * with data path
  */
-void rds_tcp_set_callbacks(struct socket *sock, struct rds_connection *conn)
+void rds_tcp_set_callbacks(struct socket *sock, struct rds_conn_path *cp)
 {
-   struct rds_tcp_connection *tc = conn->c_transport_data;
+   struct rds_tcp_connection *tc = cp->cp_transport_data;
 
rdsdebug("setting sock %p callbacks to tc %p\n", sock, tc);
write_lock_bh(>sk->sk_callback_lock);
@@ -221,12 +222,12 @@ void rds_tcp_set_callbacks(struct socket *sock, struct 
rds_connection *conn)
sock->sk->sk_data_ready = sock->sk->sk_user_data;
 
tc->t_sock = sock;
-   tc->t_cpath = >c_path[0];
+   tc->t_cpath = cp;
tc->t_orig_data_ready = sock->sk->sk_data_ready;
tc->t_orig_write_space = sock->sk->sk_write_space;
tc->t_orig_state_change = sock->sk->sk_state_change;
 
-   sock->sk->sk_user_data = conn;
+   sock->sk->sk_user_data = cp;
sock->sk->sk_data_ready = rds_tcp_data_ready;
sock->sk->sk_write_space = rds_tcp_write_space;
sock->sk->sk_state_change = rds_tcp_state_change;
diff --git a/net/rds/tcp.h b/net/rds/tcp.h
index e1ff169..151b09d 100644
--- a/net/rds/tcp.h
+++ b/net/rds/tcp.h
@@ -49,8 +49,8 @@ struct rds_tcp_statistics {
 /* tcp.c */
 void rds_tcp_tune(struct socket *sock);
 void rds_tcp_nonagle(struct socket *sock);
-void rds_tcp_set_callbacks(struct socket *sock, struct rds_connection *conn);
-void rds_tcp_reset_callbacks(struct socket *sock, struct rds_connection *conn);
+void rds_tcp_set_callbacks(struct socket *sock, struct rds_conn_path *cp);
+void rds_tcp_reset_callbacks(struct socket *sock, struct rds_conn_path *cp);
 void rds_tcp_restore_callbacks(struct socket *sock,
   struct rds_tcp_connection *tc);
 u32 rds_tcp_snd_nxt(struct 

[PATCH net-next 6/9] RDS: TCP: make receive path use the rds_conn_path

2016-06-30 Thread Sowmini Varadhan
The ->sk_user_data contains a pointer to the rds_conn_path
for the socket. Use this consistently in the rds_tcp_data_ready
callbacks to get the rds_conn_path for rds_recv_incoming.

Acked-by: Santosh Shilimkar 
Signed-off-by: Sowmini Varadhan 
---
 net/rds/ib.c   |2 +-
 net/rds/ib.h   |2 +-
 net/rds/ib_recv.c  |3 ++-
 net/rds/loop.c |4 ++--
 net/rds/rds.h  |2 +-
 net/rds/tcp.c  |2 +-
 net/rds/tcp.h  |2 +-
 net/rds/tcp_recv.c |   29 -
 net/rds/threads.c  |2 +-
 9 files changed, 26 insertions(+), 22 deletions(-)

diff --git a/net/rds/ib.c b/net/rds/ib.c
index 1b29ec9..e6ba856 100644
--- a/net/rds/ib.c
+++ b/net/rds/ib.c
@@ -385,7 +385,7 @@ struct rds_transport rds_ib_transport = {
.xmit   = rds_ib_xmit,
.xmit_rdma  = rds_ib_xmit_rdma,
.xmit_atomic= rds_ib_xmit_atomic,
-   .recv   = rds_ib_recv,
+   .recv_path  = rds_ib_recv_path,
.conn_alloc = rds_ib_conn_alloc,
.conn_free  = rds_ib_conn_free,
.conn_connect   = rds_ib_conn_connect,
diff --git a/net/rds/ib.h b/net/rds/ib.h
index 2051f4b..579de7e 100644
--- a/net/rds/ib.h
+++ b/net/rds/ib.h
@@ -354,7 +354,7 @@ void rds_ib_mr_cqe_handler(struct rds_ib_connection *ic, 
struct ib_wc *wc);
 /* ib_recv.c */
 int rds_ib_recv_init(void);
 void rds_ib_recv_exit(void);
-int rds_ib_recv(struct rds_connection *conn);
+int rds_ib_recv_path(struct rds_conn_path *conn);
 int rds_ib_recv_alloc_caches(struct rds_ib_connection *ic);
 void rds_ib_recv_free_caches(struct rds_ib_connection *ic);
 void rds_ib_recv_refill(struct rds_connection *conn, int prefill, gfp_t gfp);
diff --git a/net/rds/ib_recv.c b/net/rds/ib_recv.c
index 4ea8cb1..606a11f 100644
--- a/net/rds/ib_recv.c
+++ b/net/rds/ib_recv.c
@@ -1009,8 +1009,9 @@ void rds_ib_recv_cqe_handler(struct rds_ib_connection *ic,
rds_ib_recv_refill(conn, 0, GFP_NOWAIT);
 }
 
-int rds_ib_recv(struct rds_connection *conn)
+int rds_ib_recv_path(struct rds_conn_path *cp)
 {
+   struct rds_connection *conn = cp->cp_conn;
struct rds_ib_connection *ic = conn->c_transport_data;
int ret = 0;
 
diff --git a/net/rds/loop.c b/net/rds/loop.c
index 318c21d..20284a4 100644
--- a/net/rds/loop.c
+++ b/net/rds/loop.c
@@ -102,7 +102,7 @@ static void rds_loop_inc_free(struct rds_incoming *inc)
 }
 
 /* we need to at least give the thread something to succeed */
-static int rds_loop_recv(struct rds_connection *conn)
+static int rds_loop_recv_path(struct rds_conn_path *cp)
 {
return 0;
 }
@@ -185,7 +185,7 @@ void rds_loop_exit(void)
  */
 struct rds_transport rds_loop_transport = {
.xmit   = rds_loop_xmit,
-   .recv   = rds_loop_recv,
+   .recv_path  = rds_loop_recv_path,
.conn_alloc = rds_loop_conn_alloc,
.conn_free  = rds_loop_conn_free,
.conn_connect   = rds_loop_conn_connect,
diff --git a/net/rds/rds.h b/net/rds/rds.h
index 5bbad08..0faca30 100644
--- a/net/rds/rds.h
+++ b/net/rds/rds.h
@@ -462,7 +462,7 @@ struct rds_transport {
unsigned int hdr_off, unsigned int sg, unsigned int off);
int (*xmit_rdma)(struct rds_connection *conn, struct rm_rdma_op *op);
int (*xmit_atomic)(struct rds_connection *conn, struct rm_atomic_op 
*op);
-   int (*recv)(struct rds_connection *conn);
+   int (*recv_path)(struct rds_conn_path *cp);
int (*inc_copy_to_user)(struct rds_incoming *inc, struct iov_iter *to);
void (*inc_free)(struct rds_incoming *inc);
 
diff --git a/net/rds/tcp.c b/net/rds/tcp.c
index 5658f3e..7bc136c 100644
--- a/net/rds/tcp.c
+++ b/net/rds/tcp.c
@@ -359,7 +359,7 @@ struct rds_transport rds_tcp_transport = {
.xmit_path_prepare  = rds_tcp_xmit_path_prepare,
.xmit_path_complete = rds_tcp_xmit_path_complete,
.xmit   = rds_tcp_xmit,
-   .recv   = rds_tcp_recv,
+   .recv_path  = rds_tcp_recv_path,
.conn_alloc = rds_tcp_conn_alloc,
.conn_free  = rds_tcp_conn_free,
.conn_connect   = rds_tcp_conn_connect,
diff --git a/net/rds/tcp.h b/net/rds/tcp.h
index 151b09d..5a5f91a 100644
--- a/net/rds/tcp.h
+++ b/net/rds/tcp.h
@@ -75,7 +75,7 @@ int rds_tcp_keepalive(struct socket *sock);
 int rds_tcp_recv_init(void);
 void rds_tcp_recv_exit(void);
 void rds_tcp_data_ready(struct sock *sk);
-int rds_tcp_recv(struct rds_connection *conn);
+int rds_tcp_recv_path(struct rds_conn_path *cp);
 void rds_tcp_inc_free(struct rds_incoming *inc);
 int rds_tcp_inc_copy_to_user(struct rds_incoming *inc, struct iov_iter *to);
 
diff --git a/net/rds/tcp_recv.c b/net/rds/tcp_recv.c
index aa7a79a..ad4892e 100644
--- a/net/rds/tcp_recv.c

[PATCH net-next 1/9] RDS: Rework path specific indirections

2016-06-30 Thread Sowmini Varadhan
Refactor code to avoid separate indirections for single-path
and multipath transports. All transports (both single and mp-capable)
will get a pointer to the rds_conn_path, and can trivially derive
the rds_connection from the ->cp_conn.

Acked-by: Santosh Shilimkar 
Signed-off-by: Sowmini Varadhan 
---
 net/rds/connection.c  |5 +
 net/rds/ib.c  |4 ++--
 net/rds/ib.h  |4 ++--
 net/rds/ib_cm.c   |3 ++-
 net/rds/ib_send.c |3 ++-
 net/rds/loop.c|4 ++--
 net/rds/rds.h |3 ---
 net/rds/send.c|   16 
 net/rds/tcp.c |6 +++---
 net/rds/tcp.h |6 +++---
 net/rds/tcp_connect.c |7 ---
 net/rds/tcp_send.c|8 
 12 files changed, 29 insertions(+), 40 deletions(-)

diff --git a/net/rds/connection.c b/net/rds/connection.c
index a4b07c8..17c2f25 100644
--- a/net/rds/connection.c
+++ b/net/rds/connection.c
@@ -326,10 +326,7 @@ void rds_conn_shutdown(struct rds_conn_path *cp)
wait_event(cp->cp_waitq,
   !test_bit(RDS_RECV_REFILL, >cp_flags));
 
-   if (!conn->c_trans->t_mp_capable)
-   conn->c_trans->conn_shutdown(conn);
-   else
-   conn->c_trans->conn_path_shutdown(cp);
+   conn->c_trans->conn_path_shutdown(cp);
rds_conn_path_reset(cp);
 
if (!rds_conn_path_transition(cp, RDS_CONN_DISCONNECTING,
diff --git a/net/rds/ib.c b/net/rds/ib.c
index 44946a6..1b29ec9 100644
--- a/net/rds/ib.c
+++ b/net/rds/ib.c
@@ -381,7 +381,7 @@ void rds_ib_exit(void)
 
 struct rds_transport rds_ib_transport = {
.laddr_check= rds_ib_laddr_check,
-   .xmit_complete  = rds_ib_xmit_complete,
+   .xmit_path_complete = rds_ib_xmit_path_complete,
.xmit   = rds_ib_xmit,
.xmit_rdma  = rds_ib_xmit_rdma,
.xmit_atomic= rds_ib_xmit_atomic,
@@ -389,7 +389,7 @@ struct rds_transport rds_ib_transport = {
.conn_alloc = rds_ib_conn_alloc,
.conn_free  = rds_ib_conn_free,
.conn_connect   = rds_ib_conn_connect,
-   .conn_shutdown  = rds_ib_conn_shutdown,
+   .conn_path_shutdown = rds_ib_conn_path_shutdown,
.inc_copy_to_user   = rds_ib_inc_copy_to_user,
.inc_free   = rds_ib_inc_free,
.cm_initiate_connect= rds_ib_cm_initiate_connect,
diff --git a/net/rds/ib.h b/net/rds/ib.h
index 627fb79..2051f4b 100644
--- a/net/rds/ib.h
+++ b/net/rds/ib.h
@@ -329,7 +329,7 @@ extern struct list_head ib_nodev_conns;
 int rds_ib_conn_alloc(struct rds_connection *conn, gfp_t gfp);
 void rds_ib_conn_free(void *arg);
 int rds_ib_conn_connect(struct rds_connection *conn);
-void rds_ib_conn_shutdown(struct rds_connection *conn);
+void rds_ib_conn_path_shutdown(struct rds_conn_path *cp);
 void rds_ib_state_change(struct sock *sk);
 int rds_ib_listen_init(void);
 void rds_ib_listen_stop(void);
@@ -384,7 +384,7 @@ u32 rds_ib_ring_completed(struct rds_ib_work_ring *ring, 
u32 wr_id, u32 oldest);
 extern wait_queue_head_t rds_ib_ring_empty_wait;
 
 /* ib_send.c */
-void rds_ib_xmit_complete(struct rds_connection *conn);
+void rds_ib_xmit_path_complete(struct rds_conn_path *cp);
 int rds_ib_xmit(struct rds_connection *conn, struct rds_message *rm,
unsigned int hdr_off, unsigned int sg, unsigned int off);
 void rds_ib_send_cqe_handler(struct rds_ib_connection *ic, struct ib_wc *wc);
diff --git a/net/rds/ib_cm.c b/net/rds/ib_cm.c
index e48bb1b..e34ea0b 100644
--- a/net/rds/ib_cm.c
+++ b/net/rds/ib_cm.c
@@ -731,8 +731,9 @@ int rds_ib_conn_connect(struct rds_connection *conn)
  * so that it can be called at any point during startup.  In fact it
  * can be called multiple times for a given connection.
  */
-void rds_ib_conn_shutdown(struct rds_connection *conn)
+void rds_ib_conn_path_shutdown(struct rds_conn_path *cp)
 {
+   struct rds_connection *conn = cp->cp_conn;
struct rds_ib_connection *ic = conn->c_transport_data;
int err = 0;
 
diff --git a/net/rds/ib_send.c b/net/rds/ib_send.c
index 6e4110a..84d90c9 100644
--- a/net/rds/ib_send.c
+++ b/net/rds/ib_send.c
@@ -980,8 +980,9 @@ int rds_ib_xmit_rdma(struct rds_connection *conn, struct 
rm_rdma_op *op)
return ret;
 }
 
-void rds_ib_xmit_complete(struct rds_connection *conn)
+void rds_ib_xmit_path_complete(struct rds_conn_path *cp)
 {
+   struct rds_connection *conn = cp->cp_conn;
struct rds_ib_connection *ic = conn->c_transport_data;
 
/* We may have a pending ACK or window update we were unable
diff --git a/net/rds/loop.c b/net/rds/loop.c
index 15f83db..318c21d 100644
--- a/net/rds/loop.c
+++ b/net/rds/loop.c
@@ -156,7 +156,7 @@ static int rds_loop_conn_connect(struct rds_connection 
*conn)
return 0;
 }
 
-static void 

[PATCH net-next 2/9] RDS: TCP: Remove dead logic around c_passive in rds-tcp

2016-06-30 Thread Sowmini Varadhan
The c_passive bit is only intended for the IB transport and will
never be encountered in rds-tcp, so remove the dead logic that
predicates on this bit.

Acked-by: Santosh Shilimkar 
Signed-off-by: Sowmini Varadhan 
---
 net/rds/tcp.c |7 +--
 1 files changed, 1 insertions(+), 6 deletions(-)

diff --git a/net/rds/tcp.c b/net/rds/tcp.c
index b139630..c56fff2 100644
--- a/net/rds/tcp.c
+++ b/net/rds/tcp.c
@@ -329,11 +329,8 @@ static void rds_tcp_destroy_conns(void)
INIT_LIST_HEAD(_tcp_conn_list);
spin_unlock_irq(_tcp_conn_lock);
 
-   list_for_each_entry_safe(tc, _tc, _list, t_tcp_node) {
-   if (tc->conn->c_passive)
-   rds_conn_destroy(tc->conn->c_passive);
+   list_for_each_entry_safe(tc, _tc, _list, t_tcp_node)
rds_conn_destroy(tc->conn);
-   }
 }
 
 static void rds_tcp_exit(void);
@@ -512,8 +509,6 @@ static void rds_tcp_kill_sock(struct net *net)
sk = tc->t_sock->sk;
sk->sk_prot->disconnect(sk, 0);
tcp_done(sk);
-   if (tc->conn->c_passive)
-   rds_conn_destroy(tc->conn->c_passive);
rds_conn_destroy(tc->conn);
}
 }
-- 
1.7.1



[PATCH net-next 9/9] RDS: Do not send a pong to an incoming ping with 0 src port

2016-06-30 Thread Sowmini Varadhan
RDS ping messages are sent with a non-zero src port to a zero
dst port, so that the rds pong messages can be sent back to the
originators src port. However if a confused/malicious sender
sends a ping with a 0 src port, we'd have an infinite ping-pong
loop. To avoid this, the receiver should ignore ping messages
with a 0 src port.

Acked-by: Santosh Shilimkar 
Signed-off-by: Sowmini Varadhan 
---
 net/rds/recv.c |4 
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/net/rds/recv.c b/net/rds/recv.c
index b58f505..fed53a6 100644
--- a/net/rds/recv.c
+++ b/net/rds/recv.c
@@ -226,6 +226,10 @@ void rds_recv_incoming(struct rds_connection *conn, __be32 
saddr, __be32 daddr,
cp->cp_next_rx_seq = be64_to_cpu(inc->i_hdr.h_sequence) + 1;
 
if (rds_sysctl_ping_enable && inc->i_hdr.h_dport == 0) {
+   if (inc->i_hdr.h_sport == 0) {
+   rdsdebug("ignore ping with 0 sport from 0x%x\n", saddr);
+   goto out;
+   }
rds_stats_inc(s_recv_ping);
rds_send_pong(cp, inc->i_hdr.h_sport);
goto out;
-- 
1.7.1



[PATCH net-next 8/9] RDS: TCP: Simplify reconnect to avoid duelling reconnnect attempts

2016-06-30 Thread Sowmini Varadhan
When reconnecting, the peer with the smaller IP address will initiate
the reconnect, to avoid needless duelling SYN issues.

Acked-by: Santosh Shilimkar 
Signed-off-by: Sowmini Varadhan 
---
 net/rds/connection.c |4 +---
 net/rds/threads.c|5 +
 2 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/net/rds/connection.c b/net/rds/connection.c
index 1b0c2a7..19a4fee 100644
--- a/net/rds/connection.c
+++ b/net/rds/connection.c
@@ -355,9 +355,7 @@ void rds_conn_shutdown(struct rds_conn_path *cp)
rcu_read_lock();
if (!hlist_unhashed(>c_hash_node)) {
rcu_read_unlock();
-   if (conn->c_trans->t_type != RDS_TRANS_TCP ||
-   cp->cp_outgoing == 1)
-   rds_queue_reconnect(cp);
+   rds_queue_reconnect(cp);
} else {
rcu_read_unlock();
}
diff --git a/net/rds/threads.c b/net/rds/threads.c
index e8f0941..bc97d67 100644
--- a/net/rds/threads.c
+++ b/net/rds/threads.c
@@ -125,6 +125,11 @@ void rds_queue_reconnect(struct rds_conn_path *cp)
  conn, >c_laddr, >c_faddr,
  cp->cp_reconnect_jiffies);
 
+   /* let peer with smaller addr initiate reconnect, to avoid duels */
+   if (conn->c_trans->t_type == RDS_TRANS_TCP &&
+   conn->c_laddr > conn->c_faddr)
+   return;
+
set_bit(RDS_RECONNECT_PENDING, >cp_flags);
if (cp->cp_reconnect_jiffies == 0) {
cp->cp_reconnect_jiffies = rds_sysctl_reconnect_min_jiffies;
-- 
1.7.1



[PATCH net-next 4/9] RDS: TCP: Refactor connection destruction to handle multiple paths

2016-06-30 Thread Sowmini Varadhan
A single rds_connection may have multiple rds_conn_paths that have
to be carefully and correctly destroyed, for both rmmod and
netns-delete cases.

For both cases, we extract a single rds_tcp_connection for
each conn into a temporary list, and then invoke rds_conn_destroy()
which iteratively dismantles every path in the rds_connection.

For the netns deletion case, we additionally have to make sure
that we do not leave a socket in TIME_WAIT state, as this will
hold up the netns deletion. Thus we call rds_tcp_conn_paths_destroy()
to reset state quickly.

Acked-by: Santosh Shilimkar 
Signed-off-by: Sowmini Varadhan 
---
 net/rds/tcp.c |   46 +++---
 1 files changed, 39 insertions(+), 7 deletions(-)

diff --git a/net/rds/tcp.c b/net/rds/tcp.c
index c6b47f6..b327727 100644
--- a/net/rds/tcp.c
+++ b/net/rds/tcp.c
@@ -323,6 +323,17 @@ static void rds_tcp_conn_free(void *arg)
kmem_cache_free(rds_tcp_conn_slab, tc);
 }
 
+static bool list_has_conn(struct list_head *list, struct rds_connection *conn)
+{
+   struct rds_tcp_connection *tc, *_tc;
+
+   list_for_each_entry_safe(tc, _tc, list, t_tcp_node) {
+   if (tc->t_cpath->cp_conn == conn)
+   return true;
+   }
+   return false;
+}
+
 static void rds_tcp_destroy_conns(void)
 {
struct rds_tcp_connection *tc, *_tc;
@@ -330,8 +341,10 @@ static void rds_tcp_destroy_conns(void)
 
/* avoid calling conn_destroy with irqs off */
spin_lock_irq(_tcp_conn_lock);
-   list_splice(_tcp_conn_list, _list);
-   INIT_LIST_HEAD(_tcp_conn_list);
+   list_for_each_entry_safe(tc, _tc, _tcp_conn_list, t_tcp_node) {
+   if (!list_has_conn(_list, tc->t_cpath->cp_conn))
+   list_move_tail(>t_tcp_node, _list);
+   }
spin_unlock_irq(_tcp_conn_lock);
 
list_for_each_entry_safe(tc, _tc, _list, t_tcp_node)
@@ -491,10 +504,30 @@ static struct pernet_operations rds_tcp_net_ops = {
.size = sizeof(struct rds_tcp_net),
 };
 
+/* explicitly send a RST on each socket, thereby releasing any socket refcnts
+ * that may otherwise hold up netns deletion.
+ */
+static void rds_tcp_conn_paths_destroy(struct rds_connection *conn)
+{
+   struct rds_conn_path *cp;
+   struct rds_tcp_connection *tc;
+   int i;
+   struct sock *sk;
+
+   for (i = 0; i < RDS_MPATH_WORKERS; i++) {
+   cp = >c_path[i];
+   tc = cp->cp_transport_data;
+   if (!tc->t_sock)
+   continue;
+   sk = tc->t_sock->sk;
+   sk->sk_prot->disconnect(sk, 0);
+   tcp_done(sk);
+   }
+}
+
 static void rds_tcp_kill_sock(struct net *net)
 {
struct rds_tcp_connection *tc, *_tc;
-   struct sock *sk;
LIST_HEAD(tmp_list);
struct rds_tcp_net *rtn = net_generic(net, rds_tcp_netid);
 
@@ -507,13 +540,12 @@ static void rds_tcp_kill_sock(struct net *net)
 
if (net != c_net || !tc->t_sock)
continue;
-   list_move_tail(>t_tcp_node, _list);
+   if (!list_has_conn(_list, tc->t_cpath->cp_conn))
+   list_move_tail(>t_tcp_node, _list);
}
spin_unlock_irq(_tcp_conn_lock);
list_for_each_entry_safe(tc, _tc, _list, t_tcp_node) {
-   sk = tc->t_sock->sk;
-   sk->sk_prot->disconnect(sk, 0);
-   tcp_done(sk);
+   rds_tcp_conn_paths_destroy(tc->t_cpath->cp_conn);
rds_conn_destroy(tc->t_cpath->cp_conn);
}
 }
-- 
1.7.1



[PATCH net-next 7/9] RDS: TCP: Hooks to set up a single connection path

2016-06-30 Thread Sowmini Varadhan
This patch adds ->conn_path_connect callbacks in the rds_transport
that are used to set up a single connection path.

Acked-by: Santosh Shilimkar 
Signed-off-by: Sowmini Varadhan 
---
 net/rds/ib.c  |2 +-
 net/rds/ib.h  |2 +-
 net/rds/ib_cm.c   |3 ++-
 net/rds/loop.c|6 +++---
 net/rds/rds.h |2 +-
 net/rds/tcp.c |2 +-
 net/rds/tcp.h |4 ++--
 net/rds/tcp_connect.c |   11 ++-
 net/rds/threads.c |5 +++--
 9 files changed, 20 insertions(+), 17 deletions(-)

diff --git a/net/rds/ib.c b/net/rds/ib.c
index e6ba856..7eaf887 100644
--- a/net/rds/ib.c
+++ b/net/rds/ib.c
@@ -388,7 +388,7 @@ struct rds_transport rds_ib_transport = {
.recv_path  = rds_ib_recv_path,
.conn_alloc = rds_ib_conn_alloc,
.conn_free  = rds_ib_conn_free,
-   .conn_connect   = rds_ib_conn_connect,
+   .conn_path_connect  = rds_ib_conn_path_connect,
.conn_path_shutdown = rds_ib_conn_path_shutdown,
.inc_copy_to_user   = rds_ib_inc_copy_to_user,
.inc_free   = rds_ib_inc_free,
diff --git a/net/rds/ib.h b/net/rds/ib.h
index 579de7e..046f750 100644
--- a/net/rds/ib.h
+++ b/net/rds/ib.h
@@ -328,7 +328,7 @@ extern struct list_head ib_nodev_conns;
 /* ib_cm.c */
 int rds_ib_conn_alloc(struct rds_connection *conn, gfp_t gfp);
 void rds_ib_conn_free(void *arg);
-int rds_ib_conn_connect(struct rds_connection *conn);
+int rds_ib_conn_path_connect(struct rds_conn_path *cp);
 void rds_ib_conn_path_shutdown(struct rds_conn_path *cp);
 void rds_ib_state_change(struct sock *sk);
 int rds_ib_listen_init(void);
diff --git a/net/rds/ib_cm.c b/net/rds/ib_cm.c
index e34ea0b..5b2ab95 100644
--- a/net/rds/ib_cm.c
+++ b/net/rds/ib_cm.c
@@ -685,8 +685,9 @@ int rds_ib_cm_initiate_connect(struct rdma_cm_id *cm_id)
return ret;
 }
 
-int rds_ib_conn_connect(struct rds_connection *conn)
+int rds_ib_conn_path_connect(struct rds_conn_path *cp)
 {
+   struct rds_connection *conn = cp->cp_conn;
struct rds_ib_connection *ic = conn->c_transport_data;
struct sockaddr_in src, dest;
int ret;
diff --git a/net/rds/loop.c b/net/rds/loop.c
index 20284a4..f2bf78d 100644
--- a/net/rds/loop.c
+++ b/net/rds/loop.c
@@ -150,9 +150,9 @@ static void rds_loop_conn_free(void *arg)
kfree(lc);
 }
 
-static int rds_loop_conn_connect(struct rds_connection *conn)
+static int rds_loop_conn_path_connect(struct rds_conn_path *cp)
 {
-   rds_connect_complete(conn);
+   rds_connect_complete(cp->cp_conn);
return 0;
 }
 
@@ -188,7 +188,7 @@ struct rds_transport rds_loop_transport = {
.recv_path  = rds_loop_recv_path,
.conn_alloc = rds_loop_conn_alloc,
.conn_free  = rds_loop_conn_free,
-   .conn_connect   = rds_loop_conn_connect,
+   .conn_path_connect  = rds_loop_conn_path_connect,
.conn_path_shutdown = rds_loop_conn_path_shutdown,
.inc_copy_to_user   = rds_message_inc_copy_to_user,
.inc_free   = rds_loop_inc_free,
diff --git a/net/rds/rds.h b/net/rds/rds.h
index 0faca30..6ef07bd 100644
--- a/net/rds/rds.h
+++ b/net/rds/rds.h
@@ -454,7 +454,7 @@ struct rds_transport {
int (*laddr_check)(struct net *net, __be32 addr);
int (*conn_alloc)(struct rds_connection *conn, gfp_t gfp);
void (*conn_free)(void *data);
-   int (*conn_connect)(struct rds_connection *conn);
+   int (*conn_path_connect)(struct rds_conn_path *cp);
void (*conn_path_shutdown)(struct rds_conn_path *conn);
void (*xmit_path_prepare)(struct rds_conn_path *cp);
void (*xmit_path_complete)(struct rds_conn_path *cp);
diff --git a/net/rds/tcp.c b/net/rds/tcp.c
index 7bc136c..d278432 100644
--- a/net/rds/tcp.c
+++ b/net/rds/tcp.c
@@ -362,7 +362,7 @@ struct rds_transport rds_tcp_transport = {
.recv_path  = rds_tcp_recv_path,
.conn_alloc = rds_tcp_conn_alloc,
.conn_free  = rds_tcp_conn_free,
-   .conn_connect   = rds_tcp_conn_connect,
+   .conn_path_connect  = rds_tcp_conn_path_connect,
.conn_path_shutdown = rds_tcp_conn_path_shutdown,
.inc_copy_to_user   = rds_tcp_inc_copy_to_user,
.inc_free   = rds_tcp_inc_free,
diff --git a/net/rds/tcp.h b/net/rds/tcp.h
index 5a5f91a..1c3160f 100644
--- a/net/rds/tcp.h
+++ b/net/rds/tcp.h
@@ -13,7 +13,7 @@ struct rds_tcp_connection {
struct list_headt_tcp_node;
struct rds_conn_path*t_cpath;
/* t_conn_path_lock synchronizes the connection establishment between
-* rds_tcp_accept_one and rds_tcp_conn_connect
+* rds_tcp_accept_one and rds_tcp_conn_path_connect
 */
struct mutext_conn_path_lock;
struct socket

[PATCH net-next 3/9] RDS: TCP: Make rds_tcp_connection track the rds_conn_path

2016-06-30 Thread Sowmini Varadhan
The struct rds_tcp_connection is the transport-specific private
data structure that tracks TCP information per rds_conn_path.
Modify this structure to have a back-pointer to the rds_conn_path
for which it is the ->cp_transport_data.

Acked-by: Santosh Shilimkar 
Signed-off-by: Sowmini Varadhan 
---
 net/rds/connection.c  |   30 +++---
 net/rds/tcp.c |   44 +---
 net/rds/tcp.h |6 +++---
 net/rds/tcp_connect.c |6 +++---
 net/rds/tcp_listen.c  |4 ++--
 5 files changed, 48 insertions(+), 42 deletions(-)

diff --git a/net/rds/connection.c b/net/rds/connection.c
index 17c2f25..1b0c2a7 100644
--- a/net/rds/connection.c
+++ b/net/rds/connection.c
@@ -253,9 +253,12 @@ static struct rds_connection *__rds_conn_create(struct net 
*net,
 
for (i = 0; i < RDS_MPATH_WORKERS; i++) {
cp = >c_path[i];
-   trans->conn_free(cp->cp_transport_data);
-   if (!trans->t_mp_capable)
-   break;
+   /* The ->conn_alloc invocation may have
+* allocated resource for all paths, so all
+* of them may have to be freed here.
+*/
+   if (cp->cp_transport_data)
+   trans->conn_free(cp->cp_transport_data);
}
kmem_cache_free(rds_conn_slab, conn);
conn = found;
@@ -367,6 +370,9 @@ static void rds_conn_path_destroy(struct rds_conn_path *cp)
 {
struct rds_message *rm, *rtmp;
 
+   if (!cp->cp_transport_data)
+   return;
+
rds_conn_path_drop(cp);
flush_work(>cp_down_w);
 
@@ -398,6 +404,8 @@ static void rds_conn_path_destroy(struct rds_conn_path *cp)
 void rds_conn_destroy(struct rds_connection *conn)
 {
unsigned long flags;
+   int i;
+   struct rds_conn_path *cp;
 
rdsdebug("freeing conn %p for %pI4 -> "
 "%pI4\n", conn, >c_laddr,
@@ -410,18 +418,10 @@ void rds_conn_destroy(struct rds_connection *conn)
synchronize_rcu();
 
/* shut the connection down */
-   if (!conn->c_trans->t_mp_capable) {
-   rds_conn_path_destroy(>c_path[0]);
-   BUG_ON(!list_empty(>c_path[0].cp_retrans));
-   } else {
-   int i;
-   struct rds_conn_path *cp;
-
-   for (i = 0; i < RDS_MPATH_WORKERS; i++) {
-   cp = >c_path[i];
-   rds_conn_path_destroy(cp);
-   BUG_ON(!list_empty(>cp_retrans));
-   }
+   for (i = 0; i < RDS_MPATH_WORKERS; i++) {
+   cp = >c_path[i];
+   rds_conn_path_destroy(cp);
+   BUG_ON(!list_empty(>cp_retrans));
}
 
/*
diff --git a/net/rds/tcp.c b/net/rds/tcp.c
index c56fff2..c6b47f6 100644
--- a/net/rds/tcp.c
+++ b/net/rds/tcp.c
@@ -221,7 +221,7 @@ void rds_tcp_set_callbacks(struct socket *sock, struct 
rds_connection *conn)
sock->sk->sk_data_ready = sock->sk->sk_user_data;
 
tc->t_sock = sock;
-   tc->conn = conn;
+   tc->t_cpath = >c_path[0];
tc->t_orig_data_ready = sock->sk->sk_data_ready;
tc->t_orig_write_space = sock->sk->sk_write_space;
tc->t_orig_state_change = sock->sk->sk_state_change;
@@ -284,24 +284,29 @@ static int rds_tcp_laddr_check(struct net *net, __be32 
addr)
 static int rds_tcp_conn_alloc(struct rds_connection *conn, gfp_t gfp)
 {
struct rds_tcp_connection *tc;
+   int i;
 
-   tc = kmem_cache_alloc(rds_tcp_conn_slab, gfp);
-   if (!tc)
-   return -ENOMEM;
+   for (i = 0; i < RDS_MPATH_WORKERS; i++) {
+   tc = kmem_cache_alloc(rds_tcp_conn_slab, gfp);
+   if (!tc)
+   return -ENOMEM;
 
-   mutex_init(>t_conn_lock);
-   tc->t_sock = NULL;
-   tc->t_tinc = NULL;
-   tc->t_tinc_hdr_rem = sizeof(struct rds_header);
-   tc->t_tinc_data_rem = 0;
+   mutex_init(>t_conn_path_lock);
+   tc->t_sock = NULL;
+   tc->t_tinc = NULL;
+   tc->t_tinc_hdr_rem = sizeof(struct rds_header);
+   tc->t_tinc_data_rem = 0;
 
-   conn->c_transport_data = tc;
+   conn->c_path[i].cp_transport_data = tc;
+   tc->t_cpath = >c_path[i];
 
-   spin_lock_irq(_tcp_conn_lock);
-   list_add_tail(>t_tcp_node, _tcp_conn_list);
-   spin_unlock_irq(_tcp_conn_lock);
+   spin_lock_irq(_tcp_conn_lock);
+   list_add_tail(>t_tcp_node, _tcp_conn_list);
+   spin_unlock_irq(_tcp_conn_lock);
+   rdsdebug("rds_conn_path [%d] tc %p\n", i,
+   

[PATCH net-next 0/9] RDS:TCP data structure changes for multipath support

2016-06-30 Thread Sowmini Varadhan
The second installment of changes to enable multipath support in
RDS-TCP. This series implements the changes in rds-tcp so that the 
rds_conn_path has a pointer to the rds_tcp_connection in cp_transport_data.
Struct rds_tcp_connection keeps track of the inet_sk per path in
t_sock. The ->sk_user_data in turn is a pointer to the rds_conn_path.
With this set of changes, rds_tcp has the needed plumbing to handle
multiple paths(socket) per rds_connection.

Sowmini Varadhan (9):
  RDS: Rework path specific indirections
  RDS: TCP: Remove dead logic around c_passive in rds-tcp
  RDS: TCP: Make rds_tcp_connection track the rds_conn_path
  RDS: TCP: Refactor connection destruction to handle multiple paths
  RDS: TCP: make ->sk_user_data point to a rds_conn_path
  RDS: TCP: make receive path use the rds_conn_path
  RDS: TCP: Hooks to set up a single connection path
  RDS: TCP: Simplify reconnect to avoid duelling reconnnect attempts
  RDS: Do not send a pong to an incoming ping with 0 src port

 net/rds/connection.c  |   39 ++
 net/rds/ib.c  |8 ++--
 net/rds/ib.h  |8 ++--
 net/rds/ib_cm.c   |6 ++-
 net/rds/ib_recv.c |3 +-
 net/rds/ib_send.c |3 +-
 net/rds/loop.c|   14 +++---
 net/rds/rds.h |7 +--
 net/rds/recv.c|4 ++
 net/rds/send.c|   16 ++-
 net/rds/tcp.c |  130 +++--
 net/rds/tcp.h |   22 
 net/rds/tcp_connect.c |   38 ---
 net/rds/tcp_listen.c  |   16 +++---
 net/rds/tcp_recv.c|   39 ---
 net/rds/tcp_send.c|   20 
 net/rds/threads.c |   12 +++-
 17 files changed, 211 insertions(+), 174 deletions(-)



Re: [Patch net] net_sched: fix mirrored packets checksum

2016-06-30 Thread Daniel Borkmann

On 07/01/2016 12:42 AM, Cong Wang wrote:

On Thu, Jun 30, 2016 at 12:50 PM, Daniel Borkmann  wrote:


Maybe makes sense to move skb_push_rcsum() but /also/ skb_pull_rcsum()
to the header then? Both seem similarly small at least (could be split
f.e into two patches then, first for the move, second for the actual fix).


No objection from me. Please feel free to send a patch. ;)


Shrug, I actually meant this as feedback to your patch, since you move that
helper and not as a note to myself. ;)

Thanks,
Daniel


[PATCH 3/7] net: ethernet: bgmac: move BCMA MDIO Phy code into a separate file

2016-06-30 Thread Jon Mason
Move the BCMA MDIO phy into a separate file, as it is very tightly
coupled with the BCMA bus.  This will help with the upcoming BCMA
removal from the bgmac driver.  Optimally, this should be moved into
phy drivers, but it is too tightly coupled with the bgmac driver to
effectively move it without more changes to the driver.

Note: the phy_reset was intentionally removed, as the mdio phy subsystem
automatically resets the phy if a reset function pointer is present.  In
addition to the moving of the driver, this reset function is added.

Signed-off-by: Jon Mason 
---
 drivers/net/ethernet/broadcom/Makefile  |   2 +-
 drivers/net/ethernet/broadcom/bgmac-bcma-mdio.c | 264 
 drivers/net/ethernet/broadcom/bgmac.c   | 246 +++---
 drivers/net/ethernet/broadcom/bgmac.h   |   3 +
 4 files changed, 298 insertions(+), 217 deletions(-)
 create mode 100644 drivers/net/ethernet/broadcom/bgmac-bcma-mdio.c

diff --git a/drivers/net/ethernet/broadcom/Makefile 
b/drivers/net/ethernet/broadcom/Makefile
index 00584d7..f559794 100644
--- a/drivers/net/ethernet/broadcom/Makefile
+++ b/drivers/net/ethernet/broadcom/Makefile
@@ -10,6 +10,6 @@ obj-$(CONFIG_CNIC) += cnic.o
 obj-$(CONFIG_BNX2X) += bnx2x/
 obj-$(CONFIG_SB1250_MAC) += sb1250-mac.o
 obj-$(CONFIG_TIGON3) += tg3.o
-obj-$(CONFIG_BGMAC) += bgmac.o
+obj-$(CONFIG_BGMAC) += bgmac.o bgmac-bcma-mdio.o
 obj-$(CONFIG_SYSTEMPORT) += bcmsysport.o
 obj-$(CONFIG_BNXT) += bnxt/
diff --git a/drivers/net/ethernet/broadcom/bgmac-bcma-mdio.c 
b/drivers/net/ethernet/broadcom/bgmac-bcma-mdio.c
new file mode 100644
index 000..1e65349
--- /dev/null
+++ b/drivers/net/ethernet/broadcom/bgmac-bcma-mdio.c
@@ -0,0 +1,264 @@
+/*
+ * Driver for (BCM4706)? GBit MAC core on BCMA bus.
+ *
+ * Copyright (C) 2012 Rafał Miłecki 
+ *
+ * Licensed under the GNU/GPL. See COPYING for details.
+ */
+
+#define pr_fmt(fmt)KBUILD_MODNAME ": " fmt
+
+#include 
+#include 
+#include "bgmac.h"
+
+struct bcma_mdio {
+   struct bcma_device *core;
+   u8 phyaddr;
+};
+
+static bool bcma_mdio_wait_value(struct bcma_device *core, u16 reg, u32 mask,
+u32 value, int timeout)
+{
+   u32 val;
+   int i;
+
+   for (i = 0; i < timeout / 10; i++) {
+   val = bcma_read32(core, reg);
+   if ((val & mask) == value)
+   return true;
+   udelay(10);
+   }
+   dev_err(>dev, "Timeout waiting for reg 0x%X\n", reg);
+   return false;
+}
+
+/**
+ * PHY ops
+ **/
+
+static u16 bcma_mdio_phy_read(struct bcma_mdio *bcma_mdio, u8 phyaddr, u8 reg)
+{
+   struct bcma_device *core;
+   u16 phy_access_addr;
+   u16 phy_ctl_addr;
+   u32 tmp;
+
+   BUILD_BUG_ON(BGMAC_PA_DATA_MASK != BCMA_GMAC_CMN_PA_DATA_MASK);
+   BUILD_BUG_ON(BGMAC_PA_ADDR_MASK != BCMA_GMAC_CMN_PA_ADDR_MASK);
+   BUILD_BUG_ON(BGMAC_PA_ADDR_SHIFT != BCMA_GMAC_CMN_PA_ADDR_SHIFT);
+   BUILD_BUG_ON(BGMAC_PA_REG_MASK != BCMA_GMAC_CMN_PA_REG_MASK);
+   BUILD_BUG_ON(BGMAC_PA_REG_SHIFT != BCMA_GMAC_CMN_PA_REG_SHIFT);
+   BUILD_BUG_ON(BGMAC_PA_WRITE != BCMA_GMAC_CMN_PA_WRITE);
+   BUILD_BUG_ON(BGMAC_PA_START != BCMA_GMAC_CMN_PA_START);
+   BUILD_BUG_ON(BGMAC_PC_EPA_MASK != BCMA_GMAC_CMN_PC_EPA_MASK);
+   BUILD_BUG_ON(BGMAC_PC_MCT_MASK != BCMA_GMAC_CMN_PC_MCT_MASK);
+   BUILD_BUG_ON(BGMAC_PC_MCT_SHIFT != BCMA_GMAC_CMN_PC_MCT_SHIFT);
+   BUILD_BUG_ON(BGMAC_PC_MTE != BCMA_GMAC_CMN_PC_MTE);
+
+   if (bcma_mdio->core->id.id == BCMA_CORE_4706_MAC_GBIT) {
+   core = bcma_mdio->core->bus->drv_gmac_cmn.core;
+   phy_access_addr = BCMA_GMAC_CMN_PHY_ACCESS;
+   phy_ctl_addr = BCMA_GMAC_CMN_PHY_CTL;
+   } else {
+   core = bcma_mdio->core;
+   phy_access_addr = BGMAC_PHY_ACCESS;
+   phy_ctl_addr = BGMAC_PHY_CNTL;
+   }
+
+   tmp = bcma_read32(core, phy_ctl_addr);
+   tmp &= ~BGMAC_PC_EPA_MASK;
+   tmp |= phyaddr;
+   bcma_write32(core, phy_ctl_addr, tmp);
+
+   tmp = BGMAC_PA_START;
+   tmp |= phyaddr << BGMAC_PA_ADDR_SHIFT;
+   tmp |= reg << BGMAC_PA_REG_SHIFT;
+   bcma_write32(core, phy_access_addr, tmp);
+
+   if (!bcma_mdio_wait_value(core, phy_access_addr, BGMAC_PA_START, 0,
+ 1000)) {
+   dev_err(>dev, "Reading PHY %d register 0x%X failed\n",
+   phyaddr, reg);
+   return 0x;
+   }
+
+   return bcma_read32(core, phy_access_addr) & BGMAC_PA_DATA_MASK;
+}
+
+/* http://bcm-v4.sipsolutions.net/mac-gbit/gmac/chipphywr */
+static int bcma_mdio_phy_write(struct bcma_mdio *bcma_mdio, u8 phyaddr, u8 reg,
+  u16 value)
+{
+   struct bcma_device *core;
+   u16 

[PATCH 5/7] net: ethernet: bgmac: Add platform device support

2016-06-30 Thread Jon Mason
The bcma portion of the driver has been split off into a bcma specific
driver.  This has been mirrored for the platform driver.  The last
references to the bcma core struct have been changed into a generic
function call.  These function calls are wrappers to either the original
bcma code or new platform functions that access the same areas via MMIO.
This necessitated adding function pointers for both platform and bcma to
hide which backend is being used from the generic bgmac code.

Signed-off-by: Jon Mason 
---
 drivers/net/ethernet/broadcom/Kconfig   |  23 +-
 drivers/net/ethernet/broadcom/Makefile  |   4 +-
 drivers/net/ethernet/broadcom/bgmac-bcma-mdio.c |   2 +
 drivers/net/ethernet/broadcom/bgmac-bcma.c  | 315 +++
 drivers/net/ethernet/broadcom/bgmac-platform.c  | 210 +++
 drivers/net/ethernet/broadcom/bgmac.c   | 329 
 drivers/net/ethernet/broadcom/bgmac.h   |  73 +-
 7 files changed, 671 insertions(+), 285 deletions(-)
 create mode 100644 drivers/net/ethernet/broadcom/bgmac-bcma.c
 create mode 100644 drivers/net/ethernet/broadcom/bgmac-platform.c

diff --git a/drivers/net/ethernet/broadcom/Kconfig 
b/drivers/net/ethernet/broadcom/Kconfig
index d74a92e..bd8c80c 100644
--- a/drivers/net/ethernet/broadcom/Kconfig
+++ b/drivers/net/ethernet/broadcom/Kconfig
@@ -140,10 +140,18 @@ config BNX2X_SRIOV
  allows for virtual function acceleration in virtual environments.
 
 config BGMAC
-   tristate "BCMA bus GBit core support"
+   tristate
+   help
+ This enables the integrated ethernet controller support for many
+ Broadcom (mostly iProc) SoCs. An appropriate bus interface driver
+ needs to be enabled to select this.
+
+config BGMAC_BCMA
+   tristate "Broadcom iProc GBit BCMA support"
depends on BCMA && BCMA_HOST_SOC
depends on HAS_DMA
depends on BCM47XX || ARCH_BCM_5301X || COMPILE_TEST
+   select BGMAC
select PHYLIB
select FIXED_PHY
---help---
@@ -152,6 +160,19 @@ config BGMAC
  In case of using this driver on BCM4706 it's also requires to enable
  BCMA_DRIVER_GMAC_CMN to make it work.
 
+config BGMAC_PLATFORM
+   tristate "Broadcom iProc GBit platform support"
+   depends on HAS_DMA
+   depends on ARCH_BCM_IPROC || COMPILE_TEST
+   depends on OF
+   select BGMAC
+   select PHYLIB
+   select FIXED_PHY
+   default ARCH_BCM_IPROC
+   ---help---
+ Say Y here if you want to use the Broadcom iProc Gigabit Ethernet
+ controller through the generic platform interface
+
 config SYSTEMPORT
tristate "Broadcom SYSTEMPORT internal MAC support"
depends on OF
diff --git a/drivers/net/ethernet/broadcom/Makefile 
b/drivers/net/ethernet/broadcom/Makefile
index f559794..79f2372 100644
--- a/drivers/net/ethernet/broadcom/Makefile
+++ b/drivers/net/ethernet/broadcom/Makefile
@@ -10,6 +10,8 @@ obj-$(CONFIG_CNIC) += cnic.o
 obj-$(CONFIG_BNX2X) += bnx2x/
 obj-$(CONFIG_SB1250_MAC) += sb1250-mac.o
 obj-$(CONFIG_TIGON3) += tg3.o
-obj-$(CONFIG_BGMAC) += bgmac.o bgmac-bcma-mdio.o
+obj-$(CONFIG_BGMAC) += bgmac.o
+obj-$(CONFIG_BGMAC_BCMA) += bgmac-bcma.o bgmac-bcma-mdio.o
+obj-$(CONFIG_BGMAC_PLATFORM) += bgmac-platform.o
 obj-$(CONFIG_SYSTEMPORT) += bcmsysport.o
 obj-$(CONFIG_BNXT) += bnxt/
diff --git a/drivers/net/ethernet/broadcom/bgmac-bcma-mdio.c 
b/drivers/net/ethernet/broadcom/bgmac-bcma-mdio.c
index 1e65349..7c19c8e 100644
--- a/drivers/net/ethernet/broadcom/bgmac-bcma-mdio.c
+++ b/drivers/net/ethernet/broadcom/bgmac-bcma-mdio.c
@@ -245,6 +245,7 @@ err:
kfree(bcma_mdio);
return ERR_PTR(err);
 }
+EXPORT_SYMBOL_GPL(bcma_mdio_mii_register);
 
 void bcma_mdio_mii_unregister(struct mii_bus *mii_bus)
 {
@@ -259,6 +260,7 @@ void bcma_mdio_mii_unregister(struct mii_bus *mii_bus)
mdiobus_free(mii_bus);
kfree(bcma_mdio);
 }
+EXPORT_SYMBOL_GPL(bcma_mdio_mii_unregister);
 
 MODULE_AUTHOR("Rafał Miłecki");
 MODULE_LICENSE("GPL");
diff --git a/drivers/net/ethernet/broadcom/bgmac-bcma.c 
b/drivers/net/ethernet/broadcom/bgmac-bcma.c
new file mode 100644
index 000..9a9745c4
--- /dev/null
+++ b/drivers/net/ethernet/broadcom/bgmac-bcma.c
@@ -0,0 +1,315 @@
+/*
+ * Driver for (BCM4706)? GBit MAC core on BCMA bus.
+ *
+ * Copyright (C) 2012 Rafał Miłecki 
+ *
+ * Licensed under the GNU/GPL. See COPYING for details.
+ */
+
+#define pr_fmt(fmt)KBUILD_MODNAME ": " fmt
+
+#include 
+#include 
+#include 
+#include "bgmac.h"
+
+static inline bool bgmac_is_bcm4707_family(struct bcma_device *core)
+{
+   switch (core->bus->chipinfo.id) {
+   case BCMA_CHIP_ID_BCM4707:
+   case BCMA_CHIP_ID_BCM47094:
+   case BCMA_CHIP_ID_BCM53018:
+   return true;
+   default:
+   return false;
+   }
+}
+
+/**
+ * 

[PATCH 4/7] net: ethernet: bgmac: convert to feature flags

2016-06-30 Thread Jon Mason
The bgmac driver is using the bcma provides device ID and revision, as
well as the SoC ID and package, to determine which features are
necessary to enable, reset, etc in the driver.   In anticipation of
removing the bcma requirement for this driver, these must be changed to
not reference that struct.  In place of that, each "feature" has been
given a flag, and the flags are enabled for their respective device and
SoC.

Signed-off-by: Jon Mason 
---
 drivers/net/ethernet/broadcom/bgmac.c | 167 --
 drivers/net/ethernet/broadcom/bgmac.h |  21 -
 2 files changed, 140 insertions(+), 48 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bgmac.c 
b/drivers/net/ethernet/broadcom/bgmac.c
index 6c6bb18..b85e39a 100644
--- a/drivers/net/ethernet/broadcom/bgmac.c
+++ b/drivers/net/ethernet/broadcom/bgmac.c
@@ -109,7 +109,7 @@ static void bgmac_dma_tx_enable(struct bgmac *bgmac,
u32 ctl;
 
ctl = bgmac_read(bgmac, ring->mmio_base + BGMAC_DMA_TX_CTL);
-   if (bgmac->core->id.rev >= 4) {
+   if (bgmac->feature_flags & BGMAC_FEAT_TX_MASK_SETUP) {
ctl &= ~BGMAC_DMA_TX_BL_MASK;
ctl |= BGMAC_DMA_TX_BL_128 << BGMAC_DMA_TX_BL_SHIFT;
 
@@ -330,7 +330,7 @@ static void bgmac_dma_rx_enable(struct bgmac *bgmac,
u32 ctl;
 
ctl = bgmac_read(bgmac, ring->mmio_base + BGMAC_DMA_RX_CTL);
-   if (bgmac->core->id.rev >= 4) {
+   if (bgmac->feature_flags & BGMAC_FEAT_RX_MASK_SETUP) {
ctl &= ~BGMAC_DMA_RX_BL_MASK;
ctl |= BGMAC_DMA_RX_BL_128 << BGMAC_DMA_RX_BL_SHIFT;
 
@@ -768,14 +768,20 @@ static void bgmac_cmdcfg_maskset(struct bgmac *bgmac, u32 
mask, u32 set,
 {
u32 cmdcfg = bgmac_read(bgmac, BGMAC_CMDCFG);
u32 new_val = (cmdcfg & mask) | set;
+   u32 cmdcfg_sr;
 
-   bgmac_set(bgmac, BGMAC_CMDCFG, BGMAC_CMDCFG_SR(bgmac->core->id.rev));
+   if (bgmac->feature_flags & BGMAC_FEAT_CMDCFG_SR_REV4)
+   cmdcfg_sr = BGMAC_CMDCFG_SR_REV4;
+   else
+   cmdcfg_sr = BGMAC_CMDCFG_SR_REV0;
+
+   bgmac_set(bgmac, BGMAC_CMDCFG, cmdcfg_sr);
udelay(2);
 
if (new_val != cmdcfg || force)
bgmac_write(bgmac, BGMAC_CMDCFG, new_val);
 
-   bgmac_mask(bgmac, BGMAC_CMDCFG, ~BGMAC_CMDCFG_SR(bgmac->core->id.rev));
+   bgmac_mask(bgmac, BGMAC_CMDCFG, ~cmdcfg_sr);
udelay(2);
 }
 
@@ -804,7 +810,7 @@ static void bgmac_chip_stats_update(struct bgmac *bgmac)
 {
int i;
 
-   if (bgmac->core->id.id != BCMA_CORE_4706_MAC_GBIT) {
+   if (!(bgmac->feature_flags & BGMAC_FEAT_NO_CLR_MIB)) {
for (i = 0; i < BGMAC_NUM_MIB_TX_REGS; i++)
bgmac->mib_tx_regs[i] =
bgmac_read(bgmac,
@@ -823,7 +829,7 @@ static void bgmac_clear_mib(struct bgmac *bgmac)
 {
int i;
 
-   if (bgmac->core->id.id == BCMA_CORE_4706_MAC_GBIT)
+   if (bgmac->feature_flags & BGMAC_FEAT_NO_CLR_MIB)
return;
 
bgmac_set(bgmac, BGMAC_DEV_CTL, BGMAC_DC_MROR);
@@ -866,9 +872,8 @@ static void bgmac_mac_speed(struct bgmac *bgmac)
 static void bgmac_miiconfig(struct bgmac *bgmac)
 {
struct bcma_device *core = bgmac->core;
-   u8 imode;
 
-   if (bgmac_is_bcm4707_family(bgmac)) {
+   if (bgmac->feature_flags & BGMAC_FEAT_FORCE_SPEED_2500) {
bcma_awrite32(core, BCMA_IOCTL,
  bcma_aread32(core, BCMA_IOCTL) | 0x40 |
  BGMAC_BCMA_IOCTL_SW_CLKEN);
@@ -876,6 +881,8 @@ static void bgmac_miiconfig(struct bgmac *bgmac)
bgmac->mac_duplex = DUPLEX_FULL;
bgmac_mac_speed(bgmac);
} else {
+   u8 imode;
+
imode = (bgmac_read(bgmac, BGMAC_DEV_STATUS) &
BGMAC_DS_MM_MASK) >> BGMAC_DS_MM_SHIFT;
if (imode == 0 || imode == 1) {
@@ -890,9 +897,7 @@ static void bgmac_miiconfig(struct bgmac *bgmac)
 static void bgmac_chip_reset(struct bgmac *bgmac)
 {
struct bcma_device *core = bgmac->core;
-   struct bcma_bus *bus = core->bus;
-   struct bcma_chipinfo *ci = >chipinfo;
-   u32 flags;
+   u32 cmdcfg_sr;
u32 iost;
int i;
 
@@ -915,15 +920,12 @@ static void bgmac_chip_reset(struct bgmac *bgmac)
}
 
iost = bcma_aread32(core, BCMA_IOST);
-   if ((ci->id == BCMA_CHIP_ID_BCM5357 && ci->pkg == BCMA_PKG_ID_BCM47186) 
||
-   (ci->id == BCMA_CHIP_ID_BCM4749 && ci->pkg == 10) ||
-   (ci->id == BCMA_CHIP_ID_BCM53572 && ci->pkg == 
BCMA_PKG_ID_BCM47188))
+   if (bgmac->feature_flags & BGMAC_FEAT_IOST_ATTACHED)
iost &= ~BGMAC_BCMA_IOST_ATTACHED;
 
/* 3GMAC: for BCM4707 & BCM47094, only do core reset at bgmac_probe() */
-   if (ci->id != BCMA_CHIP_ID_BCM4707 &&
-   ci->id != BCMA_CHIP_ID_BCM47094) {
-   flags = 0;
+   if 

[PATCH 1/7] net: ethernet: bgmac: change bgmac_* prints to dev_* prints

2016-06-30 Thread Jon Mason
The bgmac_* print wrappers call dev_* prints with the dev pointer from
the bcma core.  In anticipation of removing the bcma requirement for
this driver, these must be changed to not reference that struct.  So,
simply change all of the bgmac_* prints to their dev_* counterparts.  In
some cases netdev_* prints are more appropriate, so change those as
well.

Signed-off-by: Jon Mason 
---
 drivers/net/ethernet/broadcom/bgmac.c | 103 +-
 drivers/net/ethernet/broadcom/bgmac.h |  14 +
 2 files changed, 55 insertions(+), 62 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bgmac.c 
b/drivers/net/ethernet/broadcom/bgmac.c
index e6e74ca..37b3b68 100644
--- a/drivers/net/ethernet/broadcom/bgmac.c
+++ b/drivers/net/ethernet/broadcom/bgmac.c
@@ -50,7 +50,7 @@ static bool bgmac_wait_value(struct bcma_device *core, u16 
reg, u32 mask,
return true;
udelay(10);
}
-   pr_err("Timeout waiting for reg 0x%X\n", reg);
+   dev_err(>dev, "Timeout waiting for reg 0x%X\n", reg);
return false;
 }
 
@@ -84,8 +84,8 @@ static void bgmac_dma_tx_reset(struct bgmac *bgmac, struct 
bgmac_dma_ring *ring)
udelay(10);
}
if (i)
-   bgmac_err(bgmac, "Timeout suspending DMA TX ring 0x%X 
(BGMAC_DMA_TX_STAT: 0x%08X)\n",
- ring->mmio_base, val);
+   dev_err(bgmac->dev, "Timeout suspending DMA TX ring 0x%X 
(BGMAC_DMA_TX_STAT: 0x%08X)\n",
+   ring->mmio_base, val);
 
/* Remove SUSPEND bit */
bgmac_write(bgmac, ring->mmio_base + BGMAC_DMA_TX_CTL, 0);
@@ -93,13 +93,13 @@ static void bgmac_dma_tx_reset(struct bgmac *bgmac, struct 
bgmac_dma_ring *ring)
  ring->mmio_base + BGMAC_DMA_TX_STATUS,
  BGMAC_DMA_TX_STAT, BGMAC_DMA_TX_STAT_DISABLED,
  1)) {
-   bgmac_warn(bgmac, "DMA TX ring 0x%X wasn't disabled on time, 
waiting additional 300us\n",
-  ring->mmio_base);
+   dev_warn(bgmac->dev, "DMA TX ring 0x%X wasn't disabled on time, 
waiting additional 300us\n",
+ring->mmio_base);
udelay(300);
val = bgmac_read(bgmac, ring->mmio_base + BGMAC_DMA_TX_STATUS);
if ((val & BGMAC_DMA_TX_STAT) != BGMAC_DMA_TX_STAT_DISABLED)
-   bgmac_err(bgmac, "Reset of DMA TX ring 0x%X failed\n",
- ring->mmio_base);
+   dev_err(bgmac->dev, "Reset of DMA TX ring 0x%X 
failed\n",
+   ring->mmio_base);
}
 }
 
@@ -161,7 +161,7 @@ static netdev_tx_t bgmac_dma_tx_add(struct bgmac *bgmac,
int i;
 
if (skb->len > BGMAC_DESC_CTL1_LEN) {
-   bgmac_err(bgmac, "Too long skb (%d)\n", skb->len);
+   netdev_err(bgmac->net_dev, "Too long skb (%d)\n", skb->len);
goto err_drop;
}
 
@@ -174,7 +174,7 @@ static netdev_tx_t bgmac_dma_tx_add(struct bgmac *bgmac,
 * even when ring->end overflows
 */
if (ring->end - ring->start + nr_frags + 1 >= BGMAC_TX_RING_SLOTS) {
-   bgmac_err(bgmac, "TX ring is full, queue should be stopped!\n");
+   netdev_err(bgmac->net_dev, "TX ring is full, queue should be 
stopped!\n");
netif_stop_queue(net_dev);
return NETDEV_TX_BUSY;
}
@@ -241,8 +241,8 @@ err_dma:
}
 
 err_dma_head:
-   bgmac_err(bgmac, "Mapping error of skb on ring 0x%X\n",
- ring->mmio_base);
+   netdev_err(bgmac->net_dev, "Mapping error of skb on ring 0x%X\n",
+  ring->mmio_base);
 
 err_drop:
dev_kfree_skb(skb);
@@ -320,8 +320,8 @@ static void bgmac_dma_rx_reset(struct bgmac *bgmac, struct 
bgmac_dma_ring *ring)
  ring->mmio_base + BGMAC_DMA_RX_STATUS,
  BGMAC_DMA_RX_STAT, BGMAC_DMA_RX_STAT_DISABLED,
  1))
-   bgmac_err(bgmac, "Reset of ring 0x%X RX failed\n",
- ring->mmio_base);
+   dev_err(bgmac->dev, "Reset of ring 0x%X RX failed\n",
+   ring->mmio_base);
 }
 
 static void bgmac_dma_rx_enable(struct bgmac *bgmac,
@@ -370,7 +370,7 @@ static int bgmac_dma_rx_skb_for_slot(struct bgmac *bgmac,
dma_addr = dma_map_single(dma_dev, buf + BGMAC_RX_BUF_OFFSET,
  BGMAC_RX_BUF_SIZE, DMA_FROM_DEVICE);
if (dma_mapping_error(dma_dev, dma_addr)) {
-   bgmac_err(bgmac, "DMA mapping error\n");
+   netdev_err(bgmac->net_dev, "DMA mapping error\n");
put_page(virt_to_head_page(buf));
return -ENOMEM;
}
@@ -465,16 +465,16 @@ static int bgmac_dma_rx_read(struct bgmac *bgmac, struct 
bgmac_dma_ring 

[PATCH 7/7] ARM: dts: NSP: Add bgmac entries

2016-06-30 Thread Jon Mason
Add device tree entries for the ethernet devices present on the
Broadcom Northstar Plus SoCs

Signed-off-by: Jon Mason 
---
 arch/arm/boot/dts/bcm-nsp.dtsi   | 18 ++
 arch/arm/boot/dts/bcm958625k.dts |  8 
 2 files changed, 26 insertions(+)

diff --git a/arch/arm/boot/dts/bcm-nsp.dtsi b/arch/arm/boot/dts/bcm-nsp.dtsi
index def9e78..8f4343b 100644
--- a/arch/arm/boot/dts/bcm-nsp.dtsi
+++ b/arch/arm/boot/dts/bcm-nsp.dtsi
@@ -192,6 +192,24 @@
status = "disabled";
};
 
+   gmac0: ethernet@22000 {
+   compatible = "brcm,bgmac-nsp";
+   reg = <0x022000 0x1000>,
+ <0x11 0x1000>;
+   reg-names = "gmac_base", "idm_base";
+   interrupts = ;
+   status = "disabled";
+   };
+
+   gmac1: ethernet@23000 {
+   compatible = "brcm,bgmac-nsp";
+   reg = <0x023000 0x1000>,
+ <0x111000 0x1000>;
+   reg-names = "gmac_base", "idm_base";
+   interrupts = ;
+   status = "disabled";
+   };
+
nand: nand@26000 {
compatible = "brcm,nand-iproc", "brcm,brcmnand-v6.1";
reg = <0x026000 0x600>,
diff --git a/arch/arm/boot/dts/bcm958625k.dts b/arch/arm/boot/dts/bcm958625k.dts
index e298450..d16ab53 100644
--- a/arch/arm/boot/dts/bcm958625k.dts
+++ b/arch/arm/boot/dts/bcm958625k.dts
@@ -56,6 +56,14 @@
status = "okay";
 };
 
+ {
+   status = "okay";
+};
+
+ {
+   status = "okay";
+};
+
  {
status = "okay";
 };
-- 
1.9.1



[PATCH 0/7] net: ethernet: bgmac: Add platform device support

2016-06-30 Thread Jon Mason
Well, no compilained too loudly at the RFC version of this patch series
(see https://lkml.org/lkml/2016/6/28/863).  So, I'm officially sending
this out for inclusion.  All comments from the RFC were addressed in
this version.

This patch series adds support for other, non-bcma iProc SoC's to the
bgmac driver.  This series only adds NSP support, but we are interested
in adding support for the Cygnus and NS2 families (with more possible
down the road).

To support non-bcma enabled SoCs, we need to add the standard device
tree "platform device" support.  Unfortunately, this driver is very
tighly coupled with the bcma bus and much unwinding is needed.  I tried
to break this up into a number of patches to make it more obvious what
was being done to add platform device support.  I was able to verify
that the bcma code still works using a 53012K board (NS SoC), and that
the platform code works using a 58625K board (NSP SoC).

Thanks,
Jon

Jon Mason (7):
  net: ethernet: bgmac: change bgmac_* prints to dev_* prints
  net: ethernet: bgmac: add dma_dev pointer
  net: ethernet: bgmac: move BCMA MDIO Phy code into a separate file
  net: ethernet: bgmac: convert to feature flags
  net: ethernet: bgmac: Add platform device support
  dt-bindings: net: bgmac: add bindings documentation for bgmac
  ARM: dts: NSP: Add bgmac entries

 .../devicetree/bindings/net/brcm,bgmac-nsp.txt |  24 +
 arch/arm/boot/dts/bcm-nsp.dtsi |  18 +
 arch/arm/boot/dts/bcm958625k.dts   |   8 +
 drivers/net/ethernet/broadcom/Kconfig  |  23 +-
 drivers/net/ethernet/broadcom/Makefile |   2 +
 drivers/net/ethernet/broadcom/bgmac-bcma-mdio.c| 266 +
 drivers/net/ethernet/broadcom/bgmac-bcma.c | 315 ++
 drivers/net/ethernet/broadcom/bgmac-platform.c | 210 +++
 drivers/net/ethernet/broadcom/bgmac.c  | 658 +
 drivers/net/ethernet/broadcom/bgmac.h  | 112 +++-
 10 files changed, 1120 insertions(+), 516 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/net/brcm,bgmac-nsp.txt
 create mode 100644 drivers/net/ethernet/broadcom/bgmac-bcma-mdio.c
 create mode 100644 drivers/net/ethernet/broadcom/bgmac-bcma.c
 create mode 100644 drivers/net/ethernet/broadcom/bgmac-platform.c

-- 
1.9.1



[PATCH 2/7] net: ethernet: bgmac: add dma_dev pointer

2016-06-30 Thread Jon Mason
The dma buffer allocation, etc references a dma_dev device pointer from
the bcma core.  In anticipation of removing the bcma requirement for
this driver, these must be changed to not reference that struct.  Add a
dma_dev device pointer to the bgmac stuct and reference that instead.

Signed-off-by: Jon Mason 
---
 drivers/net/ethernet/broadcom/bgmac.c | 17 +
 drivers/net/ethernet/broadcom/bgmac.h |  1 +
 2 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bgmac.c 
b/drivers/net/ethernet/broadcom/bgmac.c
index 37b3b68..3614bd8 100644
--- a/drivers/net/ethernet/broadcom/bgmac.c
+++ b/drivers/net/ethernet/broadcom/bgmac.c
@@ -152,7 +152,7 @@ static netdev_tx_t bgmac_dma_tx_add(struct bgmac *bgmac,
struct bgmac_dma_ring *ring,
struct sk_buff *skb)
 {
-   struct device *dma_dev = bgmac->core->dma_dev;
+   struct device *dma_dev = bgmac->dma_dev;
struct net_device *net_dev = bgmac->net_dev;
int index = ring->end % BGMAC_TX_RING_SLOTS;
struct bgmac_slot_info *slot = >slots[index];
@@ -254,7 +254,7 @@ err_drop:
 /* Free transmitted packets */
 static void bgmac_dma_tx_free(struct bgmac *bgmac, struct bgmac_dma_ring *ring)
 {
-   struct device *dma_dev = bgmac->core->dma_dev;
+   struct device *dma_dev = bgmac->dma_dev;
int empty_slot;
bool freed = false;
unsigned bytes_compl = 0, pkts_compl = 0;
@@ -351,7 +351,7 @@ static void bgmac_dma_rx_enable(struct bgmac *bgmac,
 static int bgmac_dma_rx_skb_for_slot(struct bgmac *bgmac,
 struct bgmac_slot_info *slot)
 {
-   struct device *dma_dev = bgmac->core->dma_dev;
+   struct device *dma_dev = bgmac->dma_dev;
dma_addr_t dma_addr;
struct bgmac_rx_header *rx;
void *buf;
@@ -440,7 +440,7 @@ static int bgmac_dma_rx_read(struct bgmac *bgmac, struct 
bgmac_dma_ring *ring,
end_slot /= sizeof(struct bgmac_dma_desc);
 
while (ring->start != end_slot) {
-   struct device *dma_dev = bgmac->core->dma_dev;
+   struct device *dma_dev = bgmac->dma_dev;
struct bgmac_slot_info *slot = >slots[ring->start];
struct bgmac_rx_header *rx = slot->buf + BGMAC_RX_BUF_OFFSET;
struct sk_buff *skb;
@@ -543,7 +543,7 @@ static bool bgmac_dma_unaligned(struct bgmac *bgmac,
 static void bgmac_dma_tx_ring_free(struct bgmac *bgmac,
   struct bgmac_dma_ring *ring)
 {
-   struct device *dma_dev = bgmac->core->dma_dev;
+   struct device *dma_dev = bgmac->dma_dev;
struct bgmac_dma_desc *dma_desc = ring->cpu_base;
struct bgmac_slot_info *slot;
int i;
@@ -569,7 +569,7 @@ static void bgmac_dma_tx_ring_free(struct bgmac *bgmac,
 static void bgmac_dma_rx_ring_free(struct bgmac *bgmac,
   struct bgmac_dma_ring *ring)
 {
-   struct device *dma_dev = bgmac->core->dma_dev;
+   struct device *dma_dev = bgmac->dma_dev;
struct bgmac_slot_info *slot;
int i;
 
@@ -590,7 +590,7 @@ static void bgmac_dma_ring_desc_free(struct bgmac *bgmac,
 struct bgmac_dma_ring *ring,
 int num_slots)
 {
-   struct device *dma_dev = bgmac->core->dma_dev;
+   struct device *dma_dev = bgmac->dma_dev;
int size;
 
if (!ring->cpu_base)
@@ -628,7 +628,7 @@ static void bgmac_dma_free(struct bgmac *bgmac)
 
 static int bgmac_dma_alloc(struct bgmac *bgmac)
 {
-   struct device *dma_dev = bgmac->core->dma_dev;
+   struct device *dma_dev = bgmac->dma_dev;
struct bgmac_dma_ring *ring;
static const u16 ring_base[] = { BGMAC_DMA_BASE0, BGMAC_DMA_BASE1,
 BGMAC_DMA_BASE2, BGMAC_DMA_BASE3, };
@@ -1701,6 +1701,7 @@ static int bgmac_probe(struct bcma_device *core)
net_dev->ethtool_ops = _ethtool_ops;
bgmac = netdev_priv(net_dev);
bgmac->dev = >dev;
+   bgmac->dma_dev = core->dma_dev;
bgmac->net_dev = net_dev;
bgmac->core = core;
bcma_set_drvdata(core, bgmac);
diff --git a/drivers/net/ethernet/broadcom/bgmac.h 
b/drivers/net/ethernet/broadcom/bgmac.h
index abb9dd8..fd20018 100644
--- a/drivers/net/ethernet/broadcom/bgmac.h
+++ b/drivers/net/ethernet/broadcom/bgmac.h
@@ -429,6 +429,7 @@ struct bgmac {
struct bcma_device *cmn; /* Reference to CMN core for BCM4706 */
 
struct device *dev;
+   struct device *dma_dev;
struct net_device *net_dev;
struct napi_struct napi;
struct mii_bus *mii_bus;
-- 
1.9.1



[PATCH 6/7] dt-bindings: net: bgmac: add bindings documentation for bgmac

2016-06-30 Thread Jon Mason
Signed-off-by: Jon Mason 
---
 .../devicetree/bindings/net/brcm,bgmac-nsp.txt | 24 ++
 1 file changed, 24 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/net/brcm,bgmac-nsp.txt

diff --git a/Documentation/devicetree/bindings/net/brcm,bgmac-nsp.txt 
b/Documentation/devicetree/bindings/net/brcm,bgmac-nsp.txt
new file mode 100644
index 000..022946c
--- /dev/null
+++ b/Documentation/devicetree/bindings/net/brcm,bgmac-nsp.txt
@@ -0,0 +1,24 @@
+Broadcom GMAC Ethernet Controller Device Tree Bindings
+-
+
+Required properties:
+ - compatible: "brcm,bgmac-nsp"
+ - reg:Address and length of the GMAC registers,
+   Address and length of the GMAC IDM registers
+ - reg-names:  Names of the registers.  Must have both "gmac_base" and
+   "idm_base"
+ - interrupts: Interrupt number
+
+Optional properties:
+- mac-address: See ethernet.txt file in the same directory
+
+Examples:
+
+gmac0: ethernet@18022000 {
+   compatible = "brcm,bgmac-nsp";
+   reg = <0x18022000 0x1000>,
+ <0x1811 0x1000>;
+   reg-names = "gmac_base", "idm_base";
+   interrupts = ;
+   status = "disabled";
+};
-- 
1.9.1



Re: [Patch net] net_sched: fix mirrored packets checksum

2016-06-30 Thread Cong Wang
On Thu, Jun 30, 2016 at 12:50 PM, Daniel Borkmann  wrote:
>
> Maybe makes sense to move skb_push_rcsum() but /also/ skb_pull_rcsum()
> to the header then? Both seem similarly small at least (could be split
> f.e into two patches then, first for the move, second for the actual fix).

No objection from me. Please feel free to send a patch. ;)


[PATCH] mwifiex: mask PCIe interrupts before removal

2016-06-30 Thread Brian Norris
The PCIe driver didn't mask the host interrupts before trying to tear
down. This causes lockups at reboot or rmmod when using MSI-X on 8997,
since the MSI handler gets confused and locks up the system.

Also tested on 8897, which does not support MSI-X (and wasn't
experiencing this same bug). No regressions seen there.

Signed-off-by: Brian Norris 
---
 drivers/net/wireless/marvell/mwifiex/pcie.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/net/wireless/marvell/mwifiex/pcie.c 
b/drivers/net/wireless/marvell/mwifiex/pcie.c
index 0c7937eb6b77..af98371dc2af 100644
--- a/drivers/net/wireless/marvell/mwifiex/pcie.c
+++ b/drivers/net/wireless/marvell/mwifiex/pcie.c
@@ -440,6 +440,11 @@ static int mwifiex_pcie_disable_host_int(struct 
mwifiex_adapter *adapter)
return 0;
 }
 
+static void mwifiex_pcie_disable_host_int_noerr(struct mwifiex_adapter 
*adapter)
+{
+   WARN_ON(mwifiex_pcie_disable_host_int(adapter));
+}
+
 /*
  * This function enables the host interrupt.
  *
@@ -2945,6 +2950,7 @@ static struct mwifiex_if_ops pcie_ops = {
.register_dev = mwifiex_register_dev,
.unregister_dev =   mwifiex_unregister_dev,
.enable_int =   mwifiex_pcie_enable_host_int,
+   .disable_int =  mwifiex_pcie_disable_host_int_noerr,
.process_int_status =   mwifiex_process_int_status,
.host_to_card = mwifiex_pcie_host_to_card,
.wakeup =   mwifiex_pm_wakeup_card,
-- 
2.8.0.rc3.226.g39d4020



Re: ethtool needs a new maintainer

2016-06-30 Thread John Fastabend
On 16-06-30 11:15 AM, John W. Linville wrote:
> On Mon, Jun 27, 2016 at 09:51:47AM -0400, John W. Linville wrote:
>> On Sun, Jun 26, 2016 at 06:11:41PM +0200, Ben Hutchings wrote:
>>> I've become steadily less enthusiastic and less responsive as a
>>> maintainer over the past year or so.  I no longer work on networking
>>> regularly, so it takes a lot more time to get into the right state of
>>> mind to think about ethtool code, while I have other demands on my time
>>> that tend to take priority.
>>>
>>> So, I would like to find a new maintainer to take over as soon as
>>> possible.  Ideally the new maintainer would have previous contributions
>>> to ethtool and an existing account on kernel.org so that they can push
>>> to the git repository and the home page.  But neither of those is
>>> essential.  Please reply if you're interested.
>>
>> I would like to take this responsibility. My previous contributions
>> to ethtool are meager, but I think my skills and interests are suited
>> to the task.  Plus, I already have a kernel.org account... :-)
> 
> Are there any other takers?  Or is this a done deal?
> 
> John
> 

+1 for having John take it on :)

.JohnF


Re: [RFC 6/7] dt-bindings: net: bgmac: add bindings documentation for bgmac

2016-06-30 Thread Jon Mason
On Thu, Jun 30, 2016 at 2:06 PM, Ray Jui  wrote:
> Hi Jon,
>
> On 6/28/2016 12:34 PM, Jon Mason wrote:
>>
>> Signed-off-by: Jon Mason 
>> ---
>>  .../devicetree/bindings/net/brcm,bgmac-enet.txt | 21
>> +
>>  1 file changed, 21 insertions(+)
>>  create mode 100644
>> Documentation/devicetree/bindings/net/brcm,bgmac-enet.txt
>>
>> diff --git a/Documentation/devicetree/bindings/net/brcm,bgmac-enet.txt
>> b/Documentation/devicetree/bindings/net/brcm,bgmac-enet.txt
>> new file mode 100644
>> index 000..efd36d5
>> --- /dev/null
>> +++ b/Documentation/devicetree/bindings/net/brcm,bgmac-enet.txt
>> @@ -0,0 +1,21 @@
>> +Broadcom GMAC Ethernet Controller Device Tree Bindings
>> +-
>> +
>> +Required properties:
>> + - compatible: "brcm,bgmac-enet"
>> + - reg:Address and length of the GMAC registers,
>> +   Address and length of the GMAC IDM registers
>
>
> As we know there will be additional optional register banks required for
> some of the other SoCs that the current driver has not yet supported. In my
> opinion, we should consider to make "reg-names" a mandatory property now and
> map the register blocks based on names.
>
> I think this will help to make our life easier in the future when new
> optional SoC specific register blocks are added, such that we can map the
> register blocks based on names instead of indices, which will change and be
> different among different SoCs and will require much more complex logic in
> the driver to deal with.

I don't have any objection to this.  I'll tweak the patches to do it by name.

>
>> + - interrupts: Interrupt number
>> +
>> +Optional properties:
>> +- mac-address: mac address to be assigned to the device
>> +
>> +Examples:
>> +
>> +gmac0: enet@18022000 {
>> +   compatible = "brcm,bgmac-enet";
>> +   reg = <0x18022000 0x1000>,
>> + <0x1811 0x1000>;
>> +   interrupts = ;
>> +   status = "disabled";
>> +};
>>
>
> Btw, I think Rob Herring should be included in the review for device tree
> binding document changes.

Thanks, I'll add him and the other DT maintainers when I send this out
as a "PATCH" shortly.

Thanks,
Jon

>
> Thanks,
>
> Ray


Re: [RFC PATCH] ila: Resolver mechanism

2016-06-30 Thread Thomas Graf
On 06/30/16 at 12:41pm, Tom Herbert wrote:
> This is not yet complete, we would still need to some controls
> to rate limit number of resolution requests and a means to track
> pending requests. I'm posting this as RFC because it seems like
> this might be part of a general mechanism to a perform address
> resolution in userspace and I would appreciate comments with
> regard to that.

I wouldn't mind having the rate limiting done as generic route
attribute so it could be applied to non-ILA routes as well.

> 
> diff --git a/include/uapi/linux/lwtunnel.h b/include/uapi/linux/lwtunnel.h
> index a478fe8..d880e49 100644
> --- a/include/uapi/linux/lwtunnel.h
> +++ b/include/uapi/linux/lwtunnel.h
> @@ -9,6 +9,7 @@ enum lwtunnel_encap_types {
>   LWTUNNEL_ENCAP_IP,
>   LWTUNNEL_ENCAP_ILA,
>   LWTUNNEL_ENCAP_IP6,
> + LWTUNNEL_ENCAP_ILA_NOTIFY,
>   __LWTUNNEL_ENCAP_MAX,
>  };

Neat.

> diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h
> index 262f037..271215f 100644
> --- a/include/uapi/linux/rtnetlink.h
> +++ b/include/uapi/linux/rtnetlink.h
> @@ -144,6 +144,9 @@ enum {
>   RTM_GETSTATS = 94,
>  #define RTM_GETSTATS RTM_GETSTATS
>  
> + RTM_ADDR_RESOLVE = 95,
> +#define RTM_ADDR_RESOLVE RTM_ADDR_RESOLVE
> +

I realize this is currently only kernel->user but let's plan ahead.
Each RTM_ group should start aligned to 4 with types specified in
the order new, del, get, set. RTM_ADDR_RESOLVE probably maps best
to NEW in terms of behaviour. See the magic around 'kind' in
rtnetlink_rcv_msg().


[PATCH net] macsec: set actual real device for xmit when !protect_frames

2016-06-30 Thread Daniel Borkmann
Avoid recursions of dev_queue_xmit() to the wrong net device when
frames are unprotected, since at that time skb->dev still points to
our own macsec dev and unlike macsec_encrypt_finish() dev pointer
doesn't get updated to real underlying device.

Fixes: c09440f7dcb3 ("macsec: introduce IEEE 802.1AE driver")
Signed-off-by: Daniel Borkmann 
Acked-by: Sabrina Dubroca 
---
 drivers/net/macsec.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/macsec.c b/drivers/net/macsec.c
index 0e7eff7..8bcd78f 100644
--- a/drivers/net/macsec.c
+++ b/drivers/net/macsec.c
@@ -2640,6 +2640,7 @@ static netdev_tx_t macsec_start_xmit(struct sk_buff *skb,
u64_stats_update_begin(_stats->syncp);
secy_stats->stats.OutPktsUntagged++;
u64_stats_update_end(_stats->syncp);
+   skb->dev = macsec->real_dev;
len = skb->len;
ret = dev_queue_xmit(skb);
count_tx(dev, ret, len);
-- 
1.9.3



Re: ethtool needs a new maintainer

2016-06-30 Thread John W. Linville
On Thu, Jun 30, 2016 at 02:37:30PM -0500, Jorge Alberto Garcia wrote:
> El 30/06/2016 02:32 p.m., "Ben Hutchings"  escribió:
> >
> > On Thu, 2016-06-30 at 14:27 -0500, Jorge Alberto Garcia wrote:
> > > On Thu, Jun 30, 2016 at 1:15 PM, John W. Linville
> > >  wrote:
> > > > On Mon, Jun 27, 2016 at 09:51:47AM -0400, John W. Linville wrote:
> > > > > On Sun, Jun 26, 2016 at 06:11:41PM +0200, Ben Hutchings wrote:
> > > > > > I've become steadily less enthusiastic and less responsive as a
> > > > > > maintainer over the past year or so.  I no longer work on
> networking
> > > > > > regularly, so it takes a lot more time to get into the right
> state of
> > > > > > mind to think about ethtool code, while I have other demands on
> my time
> > > > > > that tend to take priority.
> > > > > >
> > > > > > So, I would like to find a new maintainer to take over as soon as
> > > > > > possible.  Ideally the new maintainer would have previous
> contributions
> > > > > > to ethtool and an existing account on kernel.org so that they can
> push
> > > > > > to the git repository and the home page.  But neither of those is
> > > > > > essential.  Please reply if you're interested.
> > > > >
> > > > > I would like to take this responsibility. My previous contributions
> > > > > to ethtool are meager, but I think my skills and interests are
> suited
> > > > > to the task.  Plus, I already have a kernel.org account... :-)
> > > >
> > > > Are there any other takers?  Or is this a done deal?
> > > >
> > >
> > > hi guys !, any link to a bugzilla  / patchwork  ?
> >
> > There's nothing as organised as that, though it might be possible to
> > add categories for ethtool on  and
> > .
> >
> > Ben.
> >
> I would like to help but it will be a first for me.
> Maybe in shadow mode ?

Honestly, I don't expect the patch management stuff to be much of a
burden.  I could always use the help reviewing any patches submitted,
of course!

John
-- 
John W. LinvilleSomeday the world will need a hero, and you
linvi...@tuxdriver.com  might be all we have.  Be ready.


Re: It's back! (Re: [REGRESSION] NFS is creating a hidden port (left over from xs_bind() ))

2016-06-30 Thread Steven Rostedt
On Thu, 30 Jun 2016 16:07:26 -0400
Steven Rostedt  wrote:

> I can reproduce this by having the client unmount and remount the
> directory.

It gets even more interesting. When I unmount the directory, the hidden
port does not go away. It is still there. But if I mount it again, it
goes away (until it times out again).

Even more info:

When I first mount it, it creates 3 sockets, where one immediately is
closed:

tcp0  0 192.168.23.9:892192.168.23.22:44672 TIME_WAIT   
-   
tcp0  0 192.168.23.9:2049   192.168.23.22:815   ESTABLISHED 
-   
tcp0  0 192.168.23.9:754192.168.23.22:44672 ESTABLISHED 
-   

(192.168.23.22 is the machine remotely mounting a directory from the
server 192.168.23.9)

The trace of port 892 is this:

   kworker/u32:1-13473 [000]   4093.915114: xs_setup_tcp: RPC:   set up 
xprt to 192.168.23.22 (port 44672) via tcp
   kworker/u32:1-13473 [000]   4093.915122: xprt_create_transport: RPC: 
  created transport 8803b1c38000 with 65536 slots
kworker/0:1H-129   [000]   4093.915152: xprt_alloc_slot: RPC:47 
reserved req 88040b27ca00 xid c50ccaff
kworker/0:1H-129   [000]   4093.915157: xprt_connect: RPC:47 
xprt_connect xprt 8803b1c38000 is not connected
kworker/0:1H-129   [000]   4093.915159: xs_connect: RPC:   
xs_connect scheduled xprt 8803b1c38000
kworker/0:1H-129   [000] ..s.  4093.915170: inet_csk_get_port: snum 892
kworker/0:1H-129   [000] ..s.  4093.915177: 
 => sched_clock
 => inet_addr_type_table
 => security_capable
 => inet_bind
 => xs_bind
 => release_sock
 => sock_setsockopt
 => __sock_create
 => xs_create_sock.isra.19
 => xs_tcp_setup_socket
 => process_one_work
 => worker_thread
 => worker_thread
 => kthread
 => ret_from_fork
 => kthread
kworker/0:1H-129   [000] ..s.  4093.915178: inet_bind_hash: add 892 
8803bb9b5cc0
kworker/0:1H-129   [000] ..s.  4093.915184: 
 => inet_csk_get_port
 => sched_clock
 => inet_addr_type_table
 => security_capable
 => inet_bind
 => xs_bind
 => release_sock
 => sock_setsockopt
 => __sock_create
 => xs_create_sock.isra.19
 => xs_tcp_setup_socket
 => process_one_work
 => worker_thread
 => worker_thread
 => kthread
 => ret_from_fork
 => kthread
kworker/0:1H-129   [000]   4093.915185: xs_bind: RPC:   xs_bind 
4.136.255.255:892: ok (0)
kworker/0:1H-129   [000]   4093.915186: xs_tcp_setup_socket: RPC:   
worker connecting xprt 8803b1c38000 via tcp to 192.168.23.22 (port 44672)
kworker/0:1H-129   [000]   4093.915221: xs_tcp_setup_socket: RPC:   
8803b1c38000 connect status 115 connected 0 sock state 2
  -0 [003] ..s.  4093.915434: xs_tcp_state_change: RPC:   
xs_tcp_state_change client 8803b1c38000...
  -0 [003] ..s.  4093.915435: xs_tcp_state_change: RPC:   
state 1 conn 0 dead 0 zapped 1 sk_shutdown 0
kworker/3:1H-145   [003]   4093.915558: xprt_connect_status: RPC:47 
xprt_connect_status: retrying
kworker/3:1H-145   [003]   4093.915560: xprt_prepare_transmit: RPC:
47 xprt_prepare_transmit
kworker/3:1H-145   [003]   4093.915562: xprt_transmit: RPC:47 
xprt_transmit(72)
kworker/3:1H-145   [003]   4093.915588: xs_tcp_send_request: RPC:   
xs_tcp_send_request(72) = 0
kworker/3:1H-145   [003]   4093.915589: xprt_transmit: RPC:47 xmit 
complete
  -0 [003] ..s.  4093.915969: xs_tcp_data_ready: RPC:   
xs_tcp_data_ready...
kworker/3:1H-145   [003]   4093.916081: xs_tcp_data_recv: RPC:   
xs_tcp_data_recv started
kworker/3:1H-145   [003]   4093.916083: xs_tcp_data_recv: RPC:   
reading TCP record fragment of length 24
kworker/3:1H-145   [003]   4093.916084: xs_tcp_data_recv: RPC:   
reading XID (4 bytes)
kworker/3:1H-145   [003]   4093.916085: xs_tcp_data_recv: RPC:   
reading request with XID c50ccaff
kworker/3:1H-145   [003]   4093.916086: xs_tcp_data_recv: RPC:   
reading CALL/REPLY flag (4 bytes)
kworker/3:1H-145   [003]   4093.916087: xs_tcp_data_recv: RPC:   
read reply XID c50ccaff
kworker/3:1H-145   [003] ..s.  4093.916088: xs_tcp_data_recv: RPC:   
XID c50ccaff read 16 bytes
kworker/3:1H-145   [003] ..s.  4093.916089: xs_tcp_data_recv: RPC:   
xprt = 8803b1c38000, tcp_copied = 24, tcp_offset = 24, tcp_reclen = 24
kworker/3:1H-145   [003] ..s.  4093.916090: xprt_complete_rqst: RPC:47 
xid c50ccaff complete (24 bytes received)
kworker/3:1H-145   [003]   4093.916091: xs_tcp_data_recv: RPC:   
xs_tcp_data_recv done
kworker/3:1H-145   [003]   4093.916098: xprt_release: RPC:47 
release request 88040b27ca00
   kworker/u32:1-13473 [002]   4093.976056: xprt_destroy: RPC:   
destroying transport 8803b1c38000
   kworker/u32:1-13473 [002]   

Re: [RFC 5/7] net: ethernet: bgmac: Add platform device support

2016-06-30 Thread Jon Mason
On Thu, Jun 30, 2016 at 1:58 PM, Ray Jui  wrote:
> Hi Jon,
>
>
> On 6/28/2016 12:34 PM, Jon Mason wrote:
>>
>> The bcma portion of the driver has been split off into a bcma specific
>> driver.  This has been mirrored for the platform driver.  The last
>> references to the bcma core struct have been changed into a generic
>> function call.  These function calls are wrappers to either the original
>> bcma code or new platform functions that access the same areas via MMIO.
>> This necessitated adding function pointers for both platform and bcma to
>> hide which backend is being used from the generic bgmac code.
>>
>> Signed-off-by: Jon Mason 
>> ---
>>  drivers/net/ethernet/broadcom/Kconfig  |  23 +-
>>  drivers/net/ethernet/broadcom/Makefile |   4 +-
>>  drivers/net/ethernet/broadcom/bgmac-bcma.c | 315
>> 
>>  drivers/net/ethernet/broadcom/bgmac-platform.c | 208 
>>  drivers/net/ethernet/broadcom/bgmac.c  | 327
>> -
>>  drivers/net/ethernet/broadcom/bgmac.h  |  73 +-
>>  6 files changed, 666 insertions(+), 284 deletions(-)
>>  create mode 100644 drivers/net/ethernet/broadcom/bgmac-bcma.c
>>  create mode 100644 drivers/net/ethernet/broadcom/bgmac-platform.c
>>
>> diff --git a/drivers/net/ethernet/broadcom/Kconfig
>> b/drivers/net/ethernet/broadcom/Kconfig
>> index d74a92e..bd8c80c 100644
>> --- a/drivers/net/ethernet/broadcom/Kconfig
>> +++ b/drivers/net/ethernet/broadcom/Kconfig
>> @@ -140,10 +140,18 @@ config BNX2X_SRIOV
>>   allows for virtual function acceleration in virtual
>> environments.
>>
>>  config BGMAC
>> -   tristate "BCMA bus GBit core support"
>> +   tristate
>> +   help
>> + This enables the integrated ethernet controller support for many
>> + Broadcom (mostly iProc) SoCs. An appropriate bus interface
>> driver
>> + needs to be enabled to select this.
>> +
>> +config BGMAC_BCMA
>> +   tristate "Broadcom iProc GBit BCMA support"
>> depends on BCMA && BCMA_HOST_SOC
>> depends on HAS_DMA
>> depends on BCM47XX || ARCH_BCM_5301X || COMPILE_TEST
>> +   select BGMAC
>> select PHYLIB
>> select FIXED_PHY
>> ---help---
>> @@ -152,6 +160,19 @@ config BGMAC
>>   In case of using this driver on BCM4706 it's also requires to
>> enable
>>   BCMA_DRIVER_GMAC_CMN to make it work.
>>
>> +config BGMAC_PLATFORM
>> +   tristate "Broadcom iProc GBit platform support"
>> +   depends on HAS_DMA
>> +   depends on ARCH_BCM_IPROC || COMPILE_TEST
>> +   depends on OF
>> +   select BGMAC
>> +   select PHYLIB
>> +   select FIXED_PHY
>> +   default ARCH_BCM_IPROC
>> +   ---help---
>> + Say Y here if you want to use the Broadcom iProc Gigabit
>> Ethernet
>> + controller through the generic platform interface
>> +
>>  config SYSTEMPORT
>> tristate "Broadcom SYSTEMPORT internal MAC support"
>> depends on OF
>> diff --git a/drivers/net/ethernet/broadcom/Makefile
>> b/drivers/net/ethernet/broadcom/Makefile
>> index f559794..79f2372 100644
>> --- a/drivers/net/ethernet/broadcom/Makefile
>> +++ b/drivers/net/ethernet/broadcom/Makefile
>> @@ -10,6 +10,8 @@ obj-$(CONFIG_CNIC) += cnic.o
>>  obj-$(CONFIG_BNX2X) += bnx2x/
>>  obj-$(CONFIG_SB1250_MAC) += sb1250-mac.o
>>  obj-$(CONFIG_TIGON3) += tg3.o
>> -obj-$(CONFIG_BGMAC) += bgmac.o bgmac-bcma-mdio.o
>> +obj-$(CONFIG_BGMAC) += bgmac.o
>> +obj-$(CONFIG_BGMAC_BCMA) += bgmac-bcma.o bgmac-bcma-mdio.o
>> +obj-$(CONFIG_BGMAC_PLATFORM) += bgmac-platform.o
>>  obj-$(CONFIG_SYSTEMPORT) += bcmsysport.o
>>  obj-$(CONFIG_BNXT) += bnxt/
>> diff --git a/drivers/net/ethernet/broadcom/bgmac-bcma.c
>> b/drivers/net/ethernet/broadcom/bgmac-bcma.c
>> new file mode 100644
>> index 000..9a9745c4
>> --- /dev/null
>> +++ b/drivers/net/ethernet/broadcom/bgmac-bcma.c
>> @@ -0,0 +1,315 @@
>> +/*
>> + * Driver for (BCM4706)? GBit MAC core on BCMA bus.
>> + *
>> + * Copyright (C) 2012 Rafał Miłecki 
>> + *
>> + * Licensed under the GNU/GPL. See COPYING for details.
>> + */
>> +
>> +#define pr_fmt(fmt)KBUILD_MODNAME ": " fmt
>> +
>> +#include 
>> +#include 
>> +#include 
>> +#include "bgmac.h"
>> +
>> +static inline bool bgmac_is_bcm4707_family(struct bcma_device *core)
>> +{
>> +   switch (core->bus->chipinfo.id) {
>> +   case BCMA_CHIP_ID_BCM4707:
>> +   case BCMA_CHIP_ID_BCM47094:
>> +   case BCMA_CHIP_ID_BCM53018:
>> +   return true;
>> +   default:
>> +   return false;
>> +   }
>> +}
>> +
>> +/**
>> + * BCMA bus ops
>> + **/
>> +
>> +static u32 bcma_bgmac_read(struct bgmac *bgmac, u16 offset)
>> +{
>> +   return bcma_read32(bgmac->bcma.core, offset);
>> +}
>> +
>> +static void bcma_bgmac_write(struct 

[PATCH nf-next 3/3] netfilter: replace list_head with single linked list

2016-06-30 Thread Aaron Conole
The netfilter hook list never uses the prev pointer, and so can be
trimmed to be a smaller singly-linked list.

In addition to having a more light weight structure for hook traversal,
struct net becomes 5568 bytes (down from 6400) and struct net_device
becomes 2176 bytes (down from 2240).

Signed-off-by: Aaron Conole 
Signed-off-by: Florian Westphal 
---
 include/linux/netdevice.h |   2 +-
 include/linux/netfilter.h |  18 +++---
 include/linux/netfilter_ingress.h |  14 +++--
 include/net/netfilter/nf_queue.h  |   9 ++-
 include/net/netns/netfilter.h |   2 +-
 net/bridge/br_netfilter_hooks.c   |  21 +++
 net/netfilter/core.c  | 126 --
 net/netfilter/nf_internals.h  |  10 +--
 net/netfilter/nf_queue.c  |  15 +++--
 net/netfilter/nfnetlink_queue.c   |   5 +-
 10 files changed, 129 insertions(+), 93 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index e84d9d2..8235f67 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1747,7 +1747,7 @@ struct net_device {
 #endif
struct netdev_queue __rcu *ingress_queue;
 #ifdef CONFIG_NETFILTER_INGRESS
-   struct list_headnf_hooks_ingress;
+   struct nf_hook_entry __rcu *nf_hooks_ingress;
 #endif
 
unsigned char   broadcast[MAX_ADDR_LEN];
diff --git a/include/linux/netfilter.h b/include/linux/netfilter.h
index ad444f0..3390a84 100644
--- a/include/linux/netfilter.h
+++ b/include/linux/netfilter.h
@@ -55,12 +55,12 @@ struct nf_hook_state {
struct net_device *out;
struct sock *sk;
struct net *net;
-   struct list_head *hook_list;
+   struct nf_hook_entry *hook_list;
int (*okfn)(struct net *, struct sock *, struct sk_buff *);
 };
 
 static inline void nf_hook_state_init(struct nf_hook_state *p,
- struct list_head *hook_list,
+ struct nf_hook_entry *hook_list,
  unsigned int hook,
  int thresh, u_int8_t pf,
  struct net_device *indev,
@@ -97,6 +97,12 @@ struct nf_hook_ops {
int priority;
 };
 
+struct nf_hook_entry {
+   struct nf_hook_entry __rcu  *next;
+   struct nf_hook_ops  ops;
+   const struct nf_hook_ops*orig_ops;
+};
+
 struct nf_sockopt_ops {
struct list_head list;
 
@@ -161,8 +167,6 @@ static inline int nf_hook_thresh(u_int8_t pf, unsigned int 
hook,
 int (*okfn)(struct net *, struct sock *, 
struct sk_buff *),
 int thresh)
 {
-   struct list_head *hook_list;
-
 #ifdef HAVE_JUMP_LABEL
if (__builtin_constant_p(pf) &&
__builtin_constant_p(hook) &&
@@ -170,14 +174,14 @@ static inline int nf_hook_thresh(u_int8_t pf, unsigned 
int hook,
return 1;
 #endif
 
-   hook_list = >nf.hooks[pf][hook];
-
-   if (!list_empty(hook_list)) {
+   if (rcu_access_pointer(net->nf.hooks[pf][hook])) {
+   struct nf_hook_entry *hook_list;
struct nf_hook_state state;
int ret;
 
/* We may already have this, but read-locks nest anyway */
rcu_read_lock();
+   hook_list = rcu_dereference(net->nf.hooks[pf][hook]);
nf_hook_state_init(, hook_list, hook, thresh,
   pf, indev, outdev, sk, net, okfn);
 
diff --git a/include/linux/netfilter_ingress.h 
b/include/linux/netfilter_ingress.h
index 6965ba0..e3e3f6d 100644
--- a/include/linux/netfilter_ingress.h
+++ b/include/linux/netfilter_ingress.h
@@ -11,23 +11,27 @@ static inline bool nf_hook_ingress_active(const struct 
sk_buff *skb)
if 
(!static_key_false(_hooks_needed[NFPROTO_NETDEV][NF_NETDEV_INGRESS]))
return false;
 #endif
-   return !list_empty(>dev->nf_hooks_ingress);
+   return rcu_access_pointer(skb->dev->nf_hooks_ingress) != NULL;
 }
 
 /* caller must hold rcu_read_lock */
 static inline int nf_hook_ingress(struct sk_buff *skb)
 {
+   struct nf_hook_entry *e = rcu_dereference(skb->dev->nf_hooks_ingress);
struct nf_hook_state state;
 
-   nf_hook_state_init(, >dev->nf_hooks_ingress,
-  NF_NETDEV_INGRESS, INT_MIN, NFPROTO_NETDEV,
-  skb->dev, NULL, NULL, dev_net(skb->dev), NULL);
+   if (unlikely(!e))
+   return 0;
+
+   nf_hook_state_init(, e, NF_NETDEV_INGRESS, INT_MIN,
+  NFPROTO_NETDEV, skb->dev, NULL, NULL,
+  dev_net(skb->dev), NULL);
return nf_hook_slow(skb, );
 }
 
 static inline void nf_hook_ingress_init(struct net_device *dev)
 {
-   INIT_LIST_HEAD(>nf_hooks_ingress);
+   RCU_INIT_POINTER(dev->nf_hooks_ingress, 

[PATCH nf-next 1/3] netfilter: bridge: add and use br_nf_hook_thresh

2016-06-30 Thread Aaron Conole
From: Florian Westphal 

This replaces the last uses of NF_HOOK_THRESH().
Followup patch will remove it and rename nf_hook_thresh.

The reason is that inet (non-bridge) netfilter no longer invokes the
hooks from hooks, so we do no longer need the thresh value to skip hooks
with a lower priority.

The bridge netfilter however may need to do this. br_nf_hook_thresh is a
wrapper that is supposed to do this, i.e. only call hooks with a
priority that exceeds NF_BR_PRI_BRNF.

It's used only in the recursion cases of br_netfilter.

Signed-off-by: Florian Westphal 
Signed-off-by: Aaron Conole 
---
 include/net/netfilter/br_netfilter.h |  6 
 net/bridge/br_netfilter_hooks.c  | 57 ++--
 net/bridge/br_netfilter_ipv6.c   | 12 
 3 files changed, 59 insertions(+), 16 deletions(-)

diff --git a/include/net/netfilter/br_netfilter.h 
b/include/net/netfilter/br_netfilter.h
index e8d1448..0b0c35c 100644
--- a/include/net/netfilter/br_netfilter.h
+++ b/include/net/netfilter/br_netfilter.h
@@ -15,6 +15,12 @@ static inline struct nf_bridge_info *nf_bridge_alloc(struct 
sk_buff *skb)
 
 void nf_bridge_update_protocol(struct sk_buff *skb);
 
+int br_nf_hook_thresh(unsigned int hook, struct net *net, struct sock *sk,
+ struct sk_buff *skb, struct net_device *indev,
+ struct net_device *outdev,
+ int (*okfn)(struct net *, struct sock *,
+ struct sk_buff *));
+
 static inline struct nf_bridge_info *
 nf_bridge_info_get(const struct sk_buff *skb)
 {
diff --git a/net/bridge/br_netfilter_hooks.c b/net/bridge/br_netfilter_hooks.c
index 2d25979..19f230c 100644
--- a/net/bridge/br_netfilter_hooks.c
+++ b/net/bridge/br_netfilter_hooks.c
@@ -30,6 +30,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include 
@@ -395,11 +396,10 @@ bridged_dnat:
skb->dev = nf_bridge->physindev;
nf_bridge_update_protocol(skb);
nf_bridge_push_encap_header(skb);
-   NF_HOOK_THRESH(NFPROTO_BRIDGE,
-  NF_BR_PRE_ROUTING,
-  net, sk, skb, skb->dev, NULL,
-  br_nf_pre_routing_finish_bridge,
-  1);
+   br_nf_hook_thresh(NF_BR_PRE_ROUTING,
+ net, sk, skb, skb->dev,
+ NULL,
+ br_nf_pre_routing_finish);
return 0;
}
ether_addr_copy(eth_hdr(skb)->h_dest, dev->dev_addr);
@@ -417,10 +417,8 @@ bridged_dnat:
skb->dev = nf_bridge->physindev;
nf_bridge_update_protocol(skb);
nf_bridge_push_encap_header(skb);
-   NF_HOOK_THRESH(NFPROTO_BRIDGE, NF_BR_PRE_ROUTING, net, sk, skb,
-  skb->dev, NULL,
-  br_handle_frame_finish, 1);
-
+   br_nf_hook_thresh(NF_BR_PRE_ROUTING, net, sk, skb, skb->dev, NULL,
+ br_handle_frame_finish);
return 0;
 }
 
@@ -992,6 +990,47 @@ static struct notifier_block brnf_notifier __read_mostly = 
{
.notifier_call = brnf_device_event,
 };
 
+/* recursively invokes nf_hook_slow (again), skipping already-called
+ * hooks (< NF_BR_PRI_BRNF).
+ *
+ * Called with rcu read lock held.
+ */
+int br_nf_hook_thresh(unsigned int hook, struct net *net,
+ struct sock *sk, struct sk_buff *skb,
+ struct net_device *indev,
+ struct net_device *outdev,
+ int (*okfn)(struct net *, struct sock *,
+ struct sk_buf *))
+{
+   struct nf_hook_ops *elem;
+   struct nf_hook_state state;
+   struct list_head *head;
+   int ret;
+
+   head = >nf.hooks[NFPROTO_BRIDGE][hook];
+
+   list_for_each_entry_rcu(elem, head, list) {
+   struct nf_hook_ops *next;
+
+   next = list_entry_rcu(list_next_rcu(>list),
+ struct nf_hook_ops, list);
+   if (next->priority <= NF_BR_PRI_BRNF)
+   continue;
+   }
+
+   if (>list == head)
+   return okfn(net, sk, skb);
+
+   nf_hook_state_init(, head, hook, NF_BR_PRI_BRNF + 1,
+  NFPROTO_BRIDGE, indev, outdev, sk, net, okfn);
+
+   ret = nf_hook_slow(skb, );
+   if (ret == 1)
+   ret = okfn(net, sk, skb);
+
+   return ret;
+}
+
 #ifdef CONFIG_SYSCTL
 static
 int brnf_sysctl_call_tables(struct ctl_table *ctl, int write,
diff --git a/net/bridge/br_netfilter_ipv6.c b/net/bridge/br_netfilter_ipv6.c
index 

[PATCH nf-next 0/3] Compact netfilter hooks list

2016-06-30 Thread Aaron Conole
This series makes a simple change to shrink the netfilter hook list
from a double linked list, to a singly linked list.  Since the hooks
are always traversed in-order, there is no need to maintain a previous
pointer.

This series is being submitted for early feedback. This was jointly
developed by Florian Westphal.

Aaron Conole (1):
  netfilter: replace list_head with single linked list

Florian Westphal (2):
  netfilter: bridge: add and use br_nf_hook_thresh
  netfilter: call nf_hook_state_init with rcu_read_lock held

 include/linux/netdevice.h  |   2 +-
 include/linux/netfilter.h  |  26 --
 include/linux/netfilter_ingress.h  |  15 ++--
 include/net/netfilter/br_netfilter.h   |   6 ++
 include/net/netfilter/nf_queue.h   |   9 +-
 include/net/netns/netfilter.h  |   2 +-
 net/bridge/br_netfilter_hooks.c|  50 +--
 net/bridge/br_netfilter_ipv6.c |  12 ++-
 net/bridge/netfilter/ebt_redirect.c|   2 +-
 net/bridge/netfilter/ebtables.c|   2 +-
 net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c |   2 +-
 net/ipv4/netfilter/nf_conntrack_proto_icmp.c   |   2 +-
 net/ipv6/netfilter/nf_conntrack_l3proto_ipv6.c |   2 +-
 net/ipv6/netfilter/nf_conntrack_proto_icmpv6.c |   2 +-
 net/netfilter/core.c   | 120 +++--
 net/netfilter/nf_conntrack_core.c  |   2 +-
 net/netfilter/nf_conntrack_h323_main.c |   2 +-
 net/netfilter/nf_conntrack_helper.c|   2 +-
 net/netfilter/nf_internals.h   |  10 +--
 net/netfilter/nf_queue.c   |  15 ++--
 net/netfilter/nfnetlink_cthelper.c |   2 +-
 net/netfilter/nfnetlink_log.c  |   8 +-
 net/netfilter/nfnetlink_queue.c|   7 +-
 net/netfilter/xt_helper.c  |   2 +-
 24 files changed, 193 insertions(+), 111 deletions(-)

-- 
2.5.5



[PATCH nf-next 2/3] netfilter: call nf_hook_state_init with rcu_read_lock held

2016-06-30 Thread Aaron Conole
From: Florian Westphal 

This makes things simpler because we can store the head of the list
in the nf_state structure without worrying about concurrent add/delete
of hook elements from the list.

Signed-off-by: Florian Westphal 
Signed-off-by: Aaron Conole 
---
 include/linux/netfilter.h  | 8 +++-
 include/linux/netfilter_ingress.h  | 1 +
 net/bridge/netfilter/ebt_redirect.c| 2 +-
 net/bridge/netfilter/ebtables.c| 2 +-
 net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c | 2 +-
 net/ipv4/netfilter/nf_conntrack_proto_icmp.c   | 2 +-
 net/ipv6/netfilter/nf_conntrack_l3proto_ipv6.c | 2 +-
 net/ipv6/netfilter/nf_conntrack_proto_icmpv6.c | 2 +-
 net/netfilter/core.c   | 5 +
 net/netfilter/nf_conntrack_core.c  | 2 +-
 net/netfilter/nf_conntrack_h323_main.c | 2 +-
 net/netfilter/nf_conntrack_helper.c| 2 +-
 net/netfilter/nfnetlink_cthelper.c | 2 +-
 net/netfilter/nfnetlink_log.c  | 8 ++--
 net/netfilter/nfnetlink_queue.c| 2 +-
 net/netfilter/xt_helper.c  | 2 +-
 16 files changed, 27 insertions(+), 19 deletions(-)

diff --git a/include/linux/netfilter.h b/include/linux/netfilter.h
index 9230f9a..ad444f0 100644
--- a/include/linux/netfilter.h
+++ b/include/linux/netfilter.h
@@ -174,10 +174,16 @@ static inline int nf_hook_thresh(u_int8_t pf, unsigned 
int hook,
 
if (!list_empty(hook_list)) {
struct nf_hook_state state;
+   int ret;
 
+   /* We may already have this, but read-locks nest anyway */
+   rcu_read_lock();
nf_hook_state_init(, hook_list, hook, thresh,
   pf, indev, outdev, sk, net, okfn);
-   return nf_hook_slow(skb, );
+
+   ret = nf_hook_slow(skb, );
+   rcu_read_unlock();
+   return ret;
}
return 1;
 }
diff --git a/include/linux/netfilter_ingress.h 
b/include/linux/netfilter_ingress.h
index 5fcd375..6965ba0 100644
--- a/include/linux/netfilter_ingress.h
+++ b/include/linux/netfilter_ingress.h
@@ -14,6 +14,7 @@ static inline bool nf_hook_ingress_active(const struct 
sk_buff *skb)
return !list_empty(>dev->nf_hooks_ingress);
 }
 
+/* caller must hold rcu_read_lock */
 static inline int nf_hook_ingress(struct sk_buff *skb)
 {
struct nf_hook_state state;
diff --git a/net/bridge/netfilter/ebt_redirect.c 
b/net/bridge/netfilter/ebt_redirect.c
index 20396499..2e7c4f9 100644
--- a/net/bridge/netfilter/ebt_redirect.c
+++ b/net/bridge/netfilter/ebt_redirect.c
@@ -24,7 +24,7 @@ ebt_redirect_tg(struct sk_buff *skb, const struct 
xt_action_param *par)
return EBT_DROP;
 
if (par->hooknum != NF_BR_BROUTING)
-   /* rcu_read_lock()ed by nf_hook_slow */
+   /* rcu_read_lock()ed by nf_hook_thresh */
ether_addr_copy(eth_hdr(skb)->h_dest,
br_port_get_rcu(par->in)->br->dev->dev_addr);
else
diff --git a/net/bridge/netfilter/ebtables.c b/net/bridge/netfilter/ebtables.c
index 5a61f35..6faa2c3 100644
--- a/net/bridge/netfilter/ebtables.c
+++ b/net/bridge/netfilter/ebtables.c
@@ -148,7 +148,7 @@ ebt_basic_match(const struct ebt_entry *e, const struct 
sk_buff *skb,
return 1;
if (FWINV2(ebt_dev_check(e->out, out), EBT_IOUT))
return 1;
-   /* rcu_read_lock()ed by nf_hook_slow */
+   /* rcu_read_lock()ed by nf_hook_thresh */
if (in && (p = br_port_get_rcu(in)) != NULL &&
FWINV2(ebt_dev_check(e->logical_in, p->br->dev), EBT_ILOGICALIN))
return 1;
diff --git a/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c 
b/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c
index ae1a71a..eab0239 100644
--- a/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c
+++ b/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c
@@ -110,7 +110,7 @@ static unsigned int ipv4_helper(void *priv,
if (!help)
return NF_ACCEPT;
 
-   /* rcu_read_lock()ed by nf_hook_slow */
+   /* rcu_read_lock()ed by nf_hook_thresh */
helper = rcu_dereference(help->helper);
if (!helper)
return NF_ACCEPT;
diff --git a/net/ipv4/netfilter/nf_conntrack_proto_icmp.c 
b/net/ipv4/netfilter/nf_conntrack_proto_icmp.c
index c567e1b..2c08d6a 100644
--- a/net/ipv4/netfilter/nf_conntrack_proto_icmp.c
+++ b/net/ipv4/netfilter/nf_conntrack_proto_icmp.c
@@ -149,7 +149,7 @@ icmp_error_message(struct net *net, struct nf_conn *tmpl, 
struct sk_buff *skb,
return -NF_ACCEPT;
}
 
-   /* rcu_read_lock()ed by nf_hook_slow */
+   /* rcu_read_lock()ed by nf_hook_thresh */
innerproto = __nf_ct_l4proto_find(PF_INET, origtuple.dst.protonum);
 
/* Ordinarily, we'd expect the inverted tupleproto, but it's
diff 

[PATCH net v2] net: bcmsysport: Device stats are unsigned long

2016-06-30 Thread Florian Fainelli
On 64bits kernels, device stats are 64bits wide, not 32bits.

Fixes: 80105befdb4b ("net: systemport: add Broadcom SYSTEMPORT Ethernet MAC 
driver")
Signed-off-by: Florian Fainelli 
---
Changes in v2:

- use a plain cast to unsigned long

 drivers/net/ethernet/broadcom/bcmsysport.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/broadcom/bcmsysport.c 
b/drivers/net/ethernet/broadcom/bcmsysport.c
index 543bf38105c9..bfa26a2590c9 100644
--- a/drivers/net/ethernet/broadcom/bcmsysport.c
+++ b/drivers/net/ethernet/broadcom/bcmsysport.c
@@ -392,7 +392,7 @@ static void bcm_sysport_get_stats(struct net_device *dev,
else
p = (char *)priv;
p += s->stat_offset;
-   data[i] = *(u32 *)p;
+   data[i] = *(unsigned long *)p;
}
 }
 
-- 
2.7.4



Re: [PATCH net-next 19/19] rxrpc: Use RCU to access a peer's service connection tree

2016-06-30 Thread Peter Zijlstra
On Thu, Jun 30, 2016 at 05:36:51PM +0100, David Howells wrote:
> David Howells  wrote:
> 
> > > You want rb_link_node_rcu() here.
> > 
> > Should there be an rb_replace_node_rcu() also?
> 
> Or I could make rb_replace_node() RCU friendly.  What do you think of the
> attached changes (split into appropriate patches)?  It's a case of changing
> the order in which pointers are set in the rbtree code and inserting a
> barrier.

> diff --git a/lib/rbtree.c b/lib/rbtree.c
> index 1356454e36de..2b1a190c737c 100644
> --- a/lib/rbtree.c
> +++ b/lib/rbtree.c
> @@ -539,15 +539,17 @@ void rb_replace_node(struct rb_node *victim, struct 
> rb_node *new,
>  {
>   struct rb_node *parent = rb_parent(victim);
>  
> + /* Copy the pointers/colour from the victim to the replacement */
> + *new = *victim;
> +
>   /* Set the surrounding nodes to point to the replacement */
> - __rb_change_child(victim, new, parent, root);
>   if (victim->rb_left)
>   rb_set_parent(victim->rb_left, new);
>   if (victim->rb_right)
>   rb_set_parent(victim->rb_right, new);
>  
> - /* Copy the pointers/colour from the victim to the replacement */
> - *new = *victim;
> + /* Set the onward pointer last with an RCU barrier */
> + __rb_change_child_rcu(victim, new, parent, root);
>  }
>  EXPORT_SYMBOL(rb_replace_node);

So back when I did this work there was resistance to making the regular
RB-tree primitives more expensive for the rare RCU user. And I suspect
that this is still so.

Now, rb_replace_node() isn't a widely used primitive, so it might go
unnoticed, but since we already have rb_link_node_rcu() adding
rb_replace_node_rcu() is the consistent thing to do.


> diff --git a/net/rxrpc/conn_service.c b/net/rxrpc/conn_service.c
> index dc64211c5ee8..298ec300cfcc 100644
> --- a/net/rxrpc/conn_service.c
> +++ b/net/rxrpc/conn_service.c
> @@ -41,14 +41,14 @@ struct rxrpc_connection 
> *rxrpc_find_service_conn_rcu(struct rxrpc_peer *peer,
>*/
>   read_seqbegin_or_lock(>service_conn_lock, );
>  
> - p = peer->service_conns.rb_node;
> + p = rcu_dereference(peer->service_conns.rb_node);
>   while (p) {
>   conn = rb_entry(p, struct rxrpc_connection, 
> service_node);
>  
>   if (conn->proto.index_key < k.index_key)
> - p = p->rb_left;
> + p = rcu_dereference(p->rb_left);
>   else if (conn->proto.index_key > k.index_key)
> - p = p->rb_right;
> + p = rcu_dereference(p->rb_right);
>   else
>   goto done;
>   conn = NULL;
> @@ -90,7 +90,7 @@ rxrpc_publish_service_conn(struct rxrpc_peer *peer,
>   goto found_extant_conn;
>   }
>  
> - rb_link_node(>service_node, parent, pp);
> + rb_link_node_rcu(>service_node, parent, pp);
>   rb_insert_color(>service_node, >service_conns);
>  conn_published:
>   set_bit(RXRPC_CONN_IN_SERVICE_CONNS, >flags);

Yep, that's about right.


Re: It's back! (Re: [REGRESSION] NFS is creating a hidden port (left over from xs_bind() ))

2016-06-30 Thread Steven Rostedt
On Thu, 30 Jun 2016 18:30:42 +
Trond Myklebust  wrote:


> Wait. So the NFS mount is still active, it’s just that the socket
> disconnected due to no traffic? That should be OK. Granted that the
> port can’t be reused by another process, but you really don’t want
> that: what if there are no other ports available and you start
> writing to a file on the NFS partition?

What would cause the port to be connected to a socket again? I copied a
large file to the nfs mount, and the hidden port is still there?

Remember, this wasn't always the case, the hidden port is a recent
issue.

I ran wireshark on this and it appears to create two ports for NFS. One
of them is canceled by the client (sends a FIN/ACK) and this port is
what lays around never to be used again, and uses the other port for
all connections after that.

When I unmount the NFS directory, the port is finally freed (but has no
socket attached to it). What is the purpose of keeping this port around?

I can reproduce this by having the client unmount and remount the
directory.

-- Steve


Re: [PATCH net] net: bcmsysport: Device stats are unsigned long

2016-06-30 Thread Florian Fainelli
On 06/30/2016 11:33 AM, Andrew Lunn wrote:
> On Thu, Jun 30, 2016 at 10:56:29AM -0700, Florian Fainelli wrote:
>> On 64bits kernels, device stats are 64bits wide, not 32bits.
>>
>> Fixes: 80105befdb4b ("net: systemport: add Broadcom SYSTEMPORT Ethernet MAC 
>> driver")
>> Signed-off-by: Florian Fainelli 
>> ---
>>  drivers/net/ethernet/broadcom/bcmsysport.c | 6 +-
>>  1 file changed, 5 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/net/ethernet/broadcom/bcmsysport.c 
>> b/drivers/net/ethernet/broadcom/bcmsysport.c
>> index 543bf38105c9..21f21e23e695 100644
>> --- a/drivers/net/ethernet/broadcom/bcmsysport.c
>> +++ b/drivers/net/ethernet/broadcom/bcmsysport.c
>> @@ -392,7 +392,11 @@ static void bcm_sysport_get_stats(struct net_device 
>> *dev,
>>  else
>>  p = (char *)priv;
>>  p += s->stat_offset;
>> -data[i] = *(u32 *)p;
> 
> Hi Florian
> 
> Could you not just change this cast from u32 to unsigned long and be
> done?

Seems like this would work yes, even with our mixture of u32 stats read
from HW and the software netdev stats, thanks!
-- 
Florian


Re: [Patch net] net_sched: fix mirrored packets checksum

2016-06-30 Thread Daniel Borkmann

Hi Cong,

On 06/30/2016 07:15 PM, Cong Wang wrote:

Similar to commit 9b368814b336 ("net: fix bridge multicast packet checksum 
validation")
we need to fixup the checksum for CHECKSUM_COMPLETE when
pushing skb on RX path. Otherwise we get similar splats.

Cc: Jamal Hadi Salim 
Cc: Tom Herbert 
Signed-off-by: Cong Wang 
---
  include/linux/skbuff.h | 19 +++
  net/core/skbuff.c  | 18 --
  net/sched/act_mirred.c |  2 +-
  3 files changed, 20 insertions(+), 19 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index ee38a41..61ab566 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -2870,6 +2870,25 @@ static inline void skb_postpush_rcsum(struct sk_buff 
*skb,
  }

  /**
+ * skb_push_rcsum - push skb and update receive checksum
+ * @skb: buffer to update
+ * @len: length of data pulled
+ *
+ * This function performs an skb_push on the packet and updates
+ * the CHECKSUM_COMPLETE checksum.  It should be used on
+ * receive path processing instead of skb_push unless you know
+ * that the checksum difference is zero (e.g., a valid IP header)
+ * or you are setting ip_summed to CHECKSUM_NONE.
+ */
+static inline unsigned char *skb_push_rcsum(struct sk_buff *skb,
+   unsigned int len)
+{
+   skb_push(skb, len);
+   skb_postpush_rcsum(skb, skb->data, len);
+   return skb->data;
+}
+
+/**
   *pskb_trim_rcsum - trim received skb and update checksum
   *@skb: buffer to trim
   *@len: new length
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index f2b77e5..eb12d21 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -3016,24 +3016,6 @@ int skb_append_pagefrags(struct sk_buff *skb, struct 
page *page,
  EXPORT_SYMBOL_GPL(skb_append_pagefrags);

  /**
- * skb_push_rcsum - push skb and update receive checksum
- * @skb: buffer to update
- * @len: length of data pulled
- *
- * This function performs an skb_push on the packet and updates
- * the CHECKSUM_COMPLETE checksum.  It should be used on
- * receive path processing instead of skb_push unless you know
- * that the checksum difference is zero (e.g., a valid IP header)
- * or you are setting ip_summed to CHECKSUM_NONE.
- */
-static unsigned char *skb_push_rcsum(struct sk_buff *skb, unsigned len)
-{
-   skb_push(skb, len);
-   skb_postpush_rcsum(skb, skb->data, len);
-   return skb->data;
-}
-
-/**
   *skb_pull_rcsum - pull skb and update receive checksum
   *@skb: buffer to update
   *@len: length of data pulled


Fix looks good to me, just a minor comment.

Maybe makes sense to move skb_push_rcsum() but /also/ skb_pull_rcsum()
to the header then? Both seem similarly small at least (could be split
f.e into two patches then, first for the move, second for the actual fix).

Thanks,
Daniel


Re: [PATCH] net: stmmac: Fix null-function call in ISR on stmmac1000

2016-06-30 Thread Matt Corallo
Damn mail clients and their helpful corruption of patches...
Resent w/o the extran \n in the diff header.

On 06/29/16 07:58, David Miller wrote:
> From: Matt Corallo 
> Date: Sat, 25 Jun 2016 19:35:03 +
> 
>> At least on Meson GXBB, the CORE_IRQ_MTL_RX_OVERFLOW interrupt is thrown
>> with the stmmac1000 driver, which does not support set_rx_tail_ptr. With
>> this patch and the clock fixes, 1G ethernet works on ODROID-C2.
>>
>> Signed-off-by: Matt Corallo 
> 
> This patch does not apply without rejects to any of my trees.
> 
> ___
> linux-amlogic mailing list
> linux-amlo...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-amlogic
> 


[PATCH] net: stmmac: Fix null-function call in ISR on stmmac1000

2016-06-30 Thread Matt Corallo
(resent due to overhelpful mail client corrupting patch)

At least on Meson GXBB, the CORE_IRQ_MTL_RX_OVERFLOW interrupt is thrown
with the stmmac1000 driver, which does not support set_rx_tail_ptr. With
this patch and the clock fixes, 1G ethernet works on ODROID-C2.

Signed-off-by: Matt Corallo 
---
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index a473c18..e407126 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -2804,7 +2804,7 @@ static irqreturn_t stmmac_interrupt(int irq, void *dev_id)
priv->tx_path_in_lpi_mode = true;
if (status & CORE_IRQ_TX_PATH_EXIT_LPI_MODE)
priv->tx_path_in_lpi_mode = false;
-   if (status & CORE_IRQ_MTL_RX_OVERFLOW)
+   if (status & CORE_IRQ_MTL_RX_OVERFLOW && 
priv->hw->dma->set_rx_tail_ptr)
priv->hw->dma->set_rx_tail_ptr(priv->ioaddr,
priv->rx_tail_addr,
STMMAC_CHAN0);
-- 
2.1.4


[RFC PATCH] ila: Resolver mechanism

2016-06-30 Thread Tom Herbert
This is the first cut at an ILA resolver using LWT to implement
the hook to a userspace resolver.

The idea is that the kernel sets an ILA resolver route to the
SIR prefix, somrhting like:

ip route add ::/64 encap ila-resolve \
 via 2401:db00:20:911a::27:0 dev eth0

When a packet hits the route it is forwarded to the destination
using via path and also a rtnl message is generated with
group RTNLGRP_ILA_NOTIFY and type RTM_ADDR_RESOLVE. A userspace
daemon can listen for such messages and perform an ILA resolution
protocol to determine the ILA mapping. If the mapping is resolved
then a /128 ila encap router is set so that host can perform
ILA translation and send directly to destination.

This is not yet complete, we would still need to some controls
to rate limit number of resolution requests and a means to track
pending requests. I'm posting this as RFC because it seems like
this might be part of a general mechanism to a perform address
resolution in userspace and I would appreciate comments with
regard to that.

---
 include/uapi/linux/lwtunnel.h  |   1 +
 include/uapi/linux/rtnetlink.h |   5 ++
 net/ipv6/ila/Makefile  |   2 +-
 net/ipv6/ila/ila.h |   2 +
 net/ipv6/ila/ila_common.c  |   7 ++
 net/ipv6/ila/ila_resolver.c| 145 +
 6 files changed, 161 insertions(+), 1 deletion(-)
 create mode 100644 net/ipv6/ila/ila_resolver.c

diff --git a/include/uapi/linux/lwtunnel.h b/include/uapi/linux/lwtunnel.h
index a478fe8..d880e49 100644
--- a/include/uapi/linux/lwtunnel.h
+++ b/include/uapi/linux/lwtunnel.h
@@ -9,6 +9,7 @@ enum lwtunnel_encap_types {
LWTUNNEL_ENCAP_IP,
LWTUNNEL_ENCAP_ILA,
LWTUNNEL_ENCAP_IP6,
+   LWTUNNEL_ENCAP_ILA_NOTIFY,
__LWTUNNEL_ENCAP_MAX,
 };
 
diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h
index 262f037..271215f 100644
--- a/include/uapi/linux/rtnetlink.h
+++ b/include/uapi/linux/rtnetlink.h
@@ -144,6 +144,9 @@ enum {
RTM_GETSTATS = 94,
 #define RTM_GETSTATS RTM_GETSTATS
 
+   RTM_ADDR_RESOLVE = 95,
+#define RTM_ADDR_RESOLVE RTM_ADDR_RESOLVE
+
__RTM_MAX,
 #define RTM_MAX(((__RTM_MAX + 3) & ~3) - 1)
 };
@@ -656,6 +659,8 @@ enum rtnetlink_groups {
 #define RTNLGRP_MPLS_ROUTE RTNLGRP_MPLS_ROUTE
RTNLGRP_NSID,
 #define RTNLGRP_NSID   RTNLGRP_NSID
+   RTNLGRP_ILA_NOTIFY,
+#define RTNLGRP_ILA_NOTIFY RTNLGRP_ILA_NOTIFY
__RTNLGRP_MAX
 };
 #define RTNLGRP_MAX(__RTNLGRP_MAX - 1)
diff --git a/net/ipv6/ila/Makefile b/net/ipv6/ila/Makefile
index 4b32e59..f2aadc3 100644
--- a/net/ipv6/ila/Makefile
+++ b/net/ipv6/ila/Makefile
@@ -4,4 +4,4 @@
 
 obj-$(CONFIG_IPV6_ILA) += ila.o
 
-ila-objs := ila_common.o ila_lwt.o ila_xlat.o
+ila-objs := ila_common.o ila_lwt.o ila_xlat.o ila_resolver.o
diff --git a/net/ipv6/ila/ila.h b/net/ipv6/ila/ila.h
index e0170f6..382d360 100644
--- a/net/ipv6/ila/ila.h
+++ b/net/ipv6/ila/ila.h
@@ -118,5 +118,7 @@ int ila_lwt_init(void);
 void ila_lwt_fini(void);
 int ila_xlat_init(void);
 void ila_xlat_fini(void);
+int ila_rslv_init(void);
+void ila_rslv_fini(void);
 
 #endif /* __ILA_H */
diff --git a/net/ipv6/ila/ila_common.c b/net/ipv6/ila/ila_common.c
index ec9efbc..0a09557 100644
--- a/net/ipv6/ila/ila_common.c
+++ b/net/ipv6/ila/ila_common.c
@@ -157,7 +157,13 @@ static int __init ila_init(void)
if (ret)
goto fail_xlat;
 
+   ret = ila_rslv_init();
+   if (ret)
+   goto fail_rslv;
+
return 0;
+fail_rslv:
+   ila_xlat_fini();
 fail_xlat:
ila_lwt_fini();
 fail_lwt:
@@ -168,6 +174,7 @@ static void __exit ila_fini(void)
 {
ila_xlat_fini();
ila_lwt_fini();
+   ila_rslv_fini();
 }
 
 module_init(ila_init);
diff --git a/net/ipv6/ila/ila_resolver.c b/net/ipv6/ila/ila_resolver.c
new file mode 100644
index 000..22bb2bd
--- /dev/null
+++ b/net/ipv6/ila/ila_resolver.c
@@ -0,0 +1,145 @@
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "ila.h"
+
+struct ila_notify {
+   int type;
+   struct in6_addr addr;
+};
+
+#define ILA_NOTIFY_SIR_DEST 1
+
+static int ila_fill_notify(struct sk_buff *skb, struct in6_addr *addr,
+  u32 pid, u32 seq, int event, int flags)
+{
+   struct ila_notify *nila;
+   struct nlmsghdr *nlh;
+
+   nlh = nlmsg_put(skb, pid, seq, event, sizeof(*nila), flags);
+   if (nlh == NULL)
+   return -EMSGSIZE;
+
+   nila = nlmsg_data(nlh);
+   nila->type = ILA_NOTIFY_SIR_DEST;
+   nila->addr = *addr;
+
+   nlmsg_end(skb, nlh);
+
+   return 0;
+}
+
+void ila_rslv_notify(struct net *net, struct sk_buff *skb)
+{
+   struct ipv6hdr *ip6h = ipv6_hdr(skb);
+   struct sk_buff *nlskb;
+   int err = 0;
+
+   /* Send ILA notification to user */
+   nlskb = 

Re: ethtool needs a new maintainer

2016-06-30 Thread Ben Hutchings
On Thu, 2016-06-30 at 14:27 -0500, Jorge Alberto Garcia wrote:
> On Thu, Jun 30, 2016 at 1:15 PM, John W. Linville
>  wrote:
> > On Mon, Jun 27, 2016 at 09:51:47AM -0400, John W. Linville wrote:
> > > On Sun, Jun 26, 2016 at 06:11:41PM +0200, Ben Hutchings wrote:
> > > > I've become steadily less enthusiastic and less responsive as a
> > > > maintainer over the past year or so.  I no longer work on networking
> > > > regularly, so it takes a lot more time to get into the right state of
> > > > mind to think about ethtool code, while I have other demands on my time
> > > > that tend to take priority.
> > > > 
> > > > So, I would like to find a new maintainer to take over as soon as
> > > > possible.  Ideally the new maintainer would have previous contributions
> > > > to ethtool and an existing account on kernel.org so that they can push
> > > > to the git repository and the home page.  But neither of those is
> > > > essential.  Please reply if you're interested.
> > > 
> > > I would like to take this responsibility. My previous contributions
> > > to ethtool are meager, but I think my skills and interests are suited
> > > to the task.  Plus, I already have a kernel.org account... :-)
> > 
> > Are there any other takers?  Or is this a done deal?
> > 
> 
> hi guys !, any link to a bugzilla  / patchwork  ?

There's nothing as organised as that, though it might be possible to
add categories for ethtool on  and
.

Ben.

-- 

Ben Hutchings
To err is human; to really foul things up requires a computer.


signature.asc
Description: This is a digitally signed message part


Re: ethtool needs a new maintainer

2016-06-30 Thread Jorge Alberto Garcia
On Thu, Jun 30, 2016 at 1:15 PM, John W. Linville
 wrote:
> On Mon, Jun 27, 2016 at 09:51:47AM -0400, John W. Linville wrote:
>> On Sun, Jun 26, 2016 at 06:11:41PM +0200, Ben Hutchings wrote:
>> > I've become steadily less enthusiastic and less responsive as a
>> > maintainer over the past year or so.  I no longer work on networking
>> > regularly, so it takes a lot more time to get into the right state of
>> > mind to think about ethtool code, while I have other demands on my time
>> > that tend to take priority.
>> >
>> > So, I would like to find a new maintainer to take over as soon as
>> > possible.  Ideally the new maintainer would have previous contributions
>> > to ethtool and an existing account on kernel.org so that they can push
>> > to the git repository and the home page.  But neither of those is
>> > essential.  Please reply if you're interested.
>>
>> I would like to take this responsibility. My previous contributions
>> to ethtool are meager, but I think my skills and interests are suited
>> to the task.  Plus, I already have a kernel.org account... :-)
>
> Are there any other takers?  Or is this a done deal?
>

hi guys !, any link to a bugzilla  / patchwork  ?

> John
> --
> John W. LinvilleSomeday the world will need a hero, and you
> linvi...@tuxdriver.com  might be all we have.  Be ready.


[PATCH 2/4] net: ethernet: ti: cpsw: add multi queue support

2016-06-30 Thread Ivan Khoronzhuk
The cpsw h/w supports up to 8 tx and 8 rx channels.This patch adds
multi-queue support to the driver. An ability to configure h/w
shaper will be added with separate patch. Default shaper mode, as
before, priority mode.

The poll function handles all unprocessed channels, till all of
them are free, beginning from hi priority channel.

The statistic for every channel can be read with:
ethtool -S ethX

Signed-off-by: Ivan Khoronzhuk 
---
 drivers/net/ethernet/ti/cpsw.c  | 334 +---
 drivers/net/ethernet/ti/davinci_cpdma.c |  12 ++
 drivers/net/ethernet/ti/davinci_cpdma.h |   2 +
 3 files changed, 237 insertions(+), 111 deletions(-)

diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index a713336..14d53eb 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -140,6 +140,8 @@ do {
\
 #define CPSW_CMINTMAX_INTVL(1000 / CPSW_CMINTMIN_CNT)
 #define CPSW_CMINTMIN_INTVL((1000 / CPSW_CMINTMAX_CNT) + 1)
 
+#define CPSW_MAX_QUEUES8
+
 #define cpsw_slave_index(priv) \
((priv->data.dual_emac) ? priv->emac_port : \
priv->data.active_slave)
@@ -383,7 +385,8 @@ struct cpsw_priv {
u8  mac_addr[ETH_ALEN];
struct cpsw_slave   *slaves;
struct cpdma_ctlr   *dma;
-   struct cpdma_chan   *txch, *rxch;
+   struct cpdma_chan   *txch[CPSW_MAX_QUEUES];
+   struct cpdma_chan   *rxch[CPSW_MAX_QUEUES];
struct cpsw_ale *ale;
boolrx_pause;
booltx_pause;
@@ -395,6 +398,7 @@ struct cpsw_priv {
u32 num_irqs;
struct cpts *cpts;
u32 emac_port;
+   int rx_ch_num, tx_ch_num;
 };
 
 struct cpsw_stats {
@@ -455,35 +459,26 @@ static const struct cpsw_stats cpsw_gstrings_stats[] = {
{ "Rx Start of Frame Overruns", CPSW_STAT(rxsofoverruns) },
{ "Rx Middle of Frame Overruns", CPSW_STAT(rxmofoverruns) },
{ "Rx DMA Overruns", CPSW_STAT(rxdmaoverruns) },
-   { "Rx DMA chan: head_enqueue", CPDMA_RX_STAT(head_enqueue) },
-   { "Rx DMA chan: tail_enqueue", CPDMA_RX_STAT(tail_enqueue) },
-   { "Rx DMA chan: pad_enqueue", CPDMA_RX_STAT(pad_enqueue) },
-   { "Rx DMA chan: misqueued", CPDMA_RX_STAT(misqueued) },
-   { "Rx DMA chan: desc_alloc_fail", CPDMA_RX_STAT(desc_alloc_fail) },
-   { "Rx DMA chan: pad_alloc_fail", CPDMA_RX_STAT(pad_alloc_fail) },
-   { "Rx DMA chan: runt_receive_buf", CPDMA_RX_STAT(runt_receive_buff) },
-   { "Rx DMA chan: runt_transmit_buf", CPDMA_RX_STAT(runt_transmit_buff) },
-   { "Rx DMA chan: empty_dequeue", CPDMA_RX_STAT(empty_dequeue) },
-   { "Rx DMA chan: busy_dequeue", CPDMA_RX_STAT(busy_dequeue) },
-   { "Rx DMA chan: good_dequeue", CPDMA_RX_STAT(good_dequeue) },
-   { "Rx DMA chan: requeue", CPDMA_RX_STAT(requeue) },
-   { "Rx DMA chan: teardown_dequeue", CPDMA_RX_STAT(teardown_dequeue) },
-   { "Tx DMA chan: head_enqueue", CPDMA_TX_STAT(head_enqueue) },
-   { "Tx DMA chan: tail_enqueue", CPDMA_TX_STAT(tail_enqueue) },
-   { "Tx DMA chan: pad_enqueue", CPDMA_TX_STAT(pad_enqueue) },
-   { "Tx DMA chan: misqueued", CPDMA_TX_STAT(misqueued) },
-   { "Tx DMA chan: desc_alloc_fail", CPDMA_TX_STAT(desc_alloc_fail) },
-   { "Tx DMA chan: pad_alloc_fail", CPDMA_TX_STAT(pad_alloc_fail) },
-   { "Tx DMA chan: runt_receive_buf", CPDMA_TX_STAT(runt_receive_buff) },
-   { "Tx DMA chan: runt_transmit_buf", CPDMA_TX_STAT(runt_transmit_buff) },
-   { "Tx DMA chan: empty_dequeue", CPDMA_TX_STAT(empty_dequeue) },
-   { "Tx DMA chan: busy_dequeue", CPDMA_TX_STAT(busy_dequeue) },
-   { "Tx DMA chan: good_dequeue", CPDMA_TX_STAT(good_dequeue) },
-   { "Tx DMA chan: requeue", CPDMA_TX_STAT(requeue) },
-   { "Tx DMA chan: teardown_dequeue", CPDMA_TX_STAT(teardown_dequeue) },
 };
 
-#define CPSW_STATS_LEN ARRAY_SIZE(cpsw_gstrings_stats)
+static const struct cpsw_stats cpsw_gstrings_ch_stats[] = {
+   { "head_enqueue", CPDMA_RX_STAT(head_enqueue) },
+   { "tail_enqueue", CPDMA_RX_STAT(tail_enqueue) },
+   { "pad_enqueue", CPDMA_RX_STAT(pad_enqueue) },
+   { "misqueued", CPDMA_RX_STAT(misqueued) },
+   { "desc_alloc_fail", CPDMA_RX_STAT(desc_alloc_fail) },
+   { "pad_alloc_fail", CPDMA_RX_STAT(pad_alloc_fail) },
+   { "runt_receive_buf", CPDMA_RX_STAT(runt_receive_buff) },
+   { "runt_transmit_buf", CPDMA_RX_STAT(runt_transmit_buff) },
+   { "empty_dequeue", CPDMA_RX_STAT(empty_dequeue) },
+   { "busy_dequeue", CPDMA_RX_STAT(busy_dequeue) },
+   { "good_dequeue", CPDMA_RX_STAT(good_dequeue) },
+   { "requeue", CPDMA_RX_STAT(requeue) },
+   { "teardown_dequeue", 

[PATCH 3/4] net: ethernet: ti: davinci_cpdma: move cpdma channel struct macroses to internals

2016-06-30 Thread Ivan Khoronzhuk
Better to move functions that works with channel internals to C file.
Currently it's not required for drivers to know rx or tx a channel
is, except create function. So correct "channel create" function, and
use all channel struct macroses only for internal use.

Signed-off-by: Ivan Khoronzhuk 
---
 drivers/net/ethernet/ti/cpsw.c  |  6 ++
 drivers/net/ethernet/ti/davinci_cpdma.c | 13 +++--
 drivers/net/ethernet/ti/davinci_cpdma.h |  9 +
 drivers/net/ethernet/ti/davinci_emac.c  |  8 
 4 files changed, 18 insertions(+), 18 deletions(-)

diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index 14d53eb..595ed56 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -2575,10 +2575,8 @@ static int cpsw_probe(struct platform_device *pdev)
goto clean_runtime_disable_ret;
}
 
-   priv->txch[0] = cpdma_chan_create(priv->dma, tx_chan_num(0),
- cpsw_tx_handler);
-   priv->rxch[0] = cpdma_chan_create(priv->dma, rx_chan_num(0),
- cpsw_rx_handler);
+   priv->txch[0] = cpdma_chan_create(priv->dma, 0, cpsw_tx_handler, 0);
+   priv->rxch[0] = cpdma_chan_create(priv->dma, 0, cpsw_rx_handler, 1);
 
if (WARN_ON(!priv->rxch[0] || !priv->txch[0])) {
dev_err(priv->dev, "error initializing dma channels\n");
diff --git a/drivers/net/ethernet/ti/davinci_cpdma.c 
b/drivers/net/ethernet/ti/davinci_cpdma.c
index a4b299d..d6c4967 100644
--- a/drivers/net/ethernet/ti/davinci_cpdma.c
+++ b/drivers/net/ethernet/ti/davinci_cpdma.c
@@ -126,6 +126,13 @@ struct cpdma_chan {
int int_set, int_clear, td;
 };
 
+#define tx_chan_num(chan)  (chan)
+#define rx_chan_num(chan)  ((chan) + CPDMA_MAX_CHANNELS)
+#define is_rx_chan(chan)   ((chan)->chan_num >= CPDMA_MAX_CHANNELS)
+#define is_tx_chan(chan)   (!is_rx_chan(chan))
+#define __chan_linear(chan_num)((chan_num) & (CPDMA_MAX_CHANNELS - 1))
+#define chan_linear(chan)  __chan_linear((chan)->chan_num)
+
 /* The following make access to common cpdma_ctlr params more readable */
 #define dmaregsparams.dmaregs
 #define num_chan   params.num_chan
@@ -520,12 +527,14 @@ static void cpdma_chan_split_pool(struct cpdma_ctlr *ctlr)
 }
 
 struct cpdma_chan *cpdma_chan_create(struct cpdma_ctlr *ctlr, int chan_num,
-cpdma_handler_fn handler)
+cpdma_handler_fn handler, int rx_type)
 {
+   int offset = chan_num * 4;
struct cpdma_chan *chan;
-   int offset = (chan_num % CPDMA_MAX_CHANNELS) * 4;
unsigned long flags;
 
+   chan_num = rx_type ? rx_chan_num(chan_num) : tx_chan_num(chan_num);
+
if (__chan_linear(chan_num) >= ctlr->num_chan)
return NULL;
 
diff --git a/drivers/net/ethernet/ti/davinci_cpdma.h 
b/drivers/net/ethernet/ti/davinci_cpdma.h
index 3ce91a1..52db03a 100644
--- a/drivers/net/ethernet/ti/davinci_cpdma.h
+++ b/drivers/net/ethernet/ti/davinci_cpdma.h
@@ -17,13 +17,6 @@
 
 #define CPDMA_MAX_CHANNELS BITS_PER_LONG
 
-#define tx_chan_num(chan)  (chan)
-#define rx_chan_num(chan)  ((chan) + CPDMA_MAX_CHANNELS)
-#define is_rx_chan(chan)   ((chan)->chan_num >= CPDMA_MAX_CHANNELS)
-#define is_tx_chan(chan)   (!is_rx_chan(chan))
-#define __chan_linear(chan_num)((chan_num) & (CPDMA_MAX_CHANNELS - 1))
-#define chan_linear(chan)  __chan_linear((chan)->chan_num)
-
 #define CPDMA_RX_SOURCE_PORT(__status__)   ((__status__ >> 16) & 0x7)
 
 #define CPDMA_EOI_RX_THRESH0x0
@@ -80,7 +73,7 @@ int cpdma_ctlr_stop(struct cpdma_ctlr *ctlr);
 int cpdma_ctlr_dump(struct cpdma_ctlr *ctlr);
 
 struct cpdma_chan *cpdma_chan_create(struct cpdma_ctlr *ctlr, int chan_num,
-cpdma_handler_fn handler);
+cpdma_handler_fn handler, int rx_type);
 int cpdma_chan_get_rx_buf_num(struct cpdma_chan *chan);
 int cpdma_chan_destroy(struct cpdma_chan *chan);
 int cpdma_chan_start(struct cpdma_chan *chan);
diff --git a/drivers/net/ethernet/ti/davinci_emac.c 
b/drivers/net/ethernet/ti/davinci_emac.c
index f56d66e..1df0c89 100644
--- a/drivers/net/ethernet/ti/davinci_emac.c
+++ b/drivers/net/ethernet/ti/davinci_emac.c
@@ -2008,10 +2008,10 @@ static int davinci_emac_probe(struct platform_device 
*pdev)
goto no_pdata;
}
 
-   priv->txchan = cpdma_chan_create(priv->dma, tx_chan_num(EMAC_DEF_TX_CH),
-  emac_tx_handler);
-   priv->rxchan = cpdma_chan_create(priv->dma, rx_chan_num(EMAC_DEF_RX_CH),
-  emac_rx_handler);
+   priv->txchan = cpdma_chan_create(priv->dma, EMAC_DEF_TX_CH,
+emac_tx_handler, 0);
+   priv->rxchan = cpdma_chan_create(priv->dma, 

[PATCH 4/4] net: ethernet: ti: cpsw: add ethtool channels support

2016-06-30 Thread Ivan Khoronzhuk
These ops allow to control number of channels driver is allowed to
work with. The maximum number of channels is 8 for rx and 8 for tx.
After this patch the following commands are possible:

$ ethtool -l eth0
$ ethtool -L eth0 rx 6 tx 6

Signed-off-by: Ivan Khoronzhuk 
---
 drivers/net/ethernet/ti/cpsw.c | 188 +
 1 file changed, 188 insertions(+)

diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index 595ed56..729b8be 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -740,6 +740,11 @@ static void cpsw_rx_handler(void *token, int len, int 
status)
}
 
 requeue:
+   if (netif_dormant(ndev)) {
+   dev_kfree_skb_any(new_skb);
+   return;
+   }
+
ch = priv->rxch[skb_get_queue_mapping(new_skb)];
ret = cpdma_chan_submit(ch, new_skb, new_skb->data,
skb_tailroom(new_skb), 0);
@@ -2077,6 +2082,187 @@ static void cpsw_ethtool_op_complete(struct net_device 
*ndev)
cpsw_err(priv, drv, "ethtool complete failed %d\n", ret);
 }
 
+static void cpsw_get_channels(struct net_device *dev,
+ struct ethtool_channels *ch)
+{
+   struct cpsw_priv *priv = netdev_priv(dev);
+
+   ch->max_combined = 0;
+   ch->max_rx = CPSW_MAX_QUEUES;
+   ch->max_tx = CPSW_MAX_QUEUES;
+   ch->max_other = 0;
+   ch->other_count = 0;
+   ch->rx_count = priv->rx_ch_num;
+   ch->tx_count = priv->tx_ch_num;
+   ch->combined_count = 0;
+}
+
+static int cpsw_check_ch_settings(struct cpsw_priv *priv,
+ struct ethtool_channels *ch)
+{
+   if (ch->combined_count)
+   return -EINVAL;
+
+   /* verify we have at least one channel in each direction */
+   if (!ch->rx_count || !ch->tx_count)
+   return -EINVAL;
+
+   if (ch->rx_count > priv->data.channels ||
+   ch->tx_count > priv->data.channels)
+   return -EINVAL;
+
+   return 0;
+}
+
+static void cpsw_sync_dual_ch_list(struct net_device *sdev,
+  struct net_device *ddev)
+{
+   struct cpsw_priv *priv_s, *priv_d;
+   int i;
+
+   priv_s = netdev_priv(sdev);
+   priv_d = netdev_priv(ddev);
+
+   priv_d->rx_ch_num = priv_s->rx_ch_num;
+   priv_d->tx_ch_num = priv_s->tx_ch_num;
+
+   for (i = 0; i < priv_d->tx_ch_num; i++)
+   priv_d->txch[i] = priv_s->txch[i];
+   for (i = 0; i < priv_d->rx_ch_num; i++)
+   priv_d->rxch[i] = priv_s->rxch[i];
+}
+
+static int cpsw_update_channels_res(struct cpsw_priv *priv, int ch_num, int rx)
+{
+   int (*poll)(struct napi_struct *, int);
+   void (*handler)(void *, int, int);
+   struct cpdma_chan **chan;
+   int *ch;
+   int ret;
+
+   if (rx) {
+   ch = >rx_ch_num;
+   chan = priv->rxch;
+   handler = cpsw_rx_handler;
+   poll = cpsw_rx_poll;
+   } else {
+   ch = >tx_ch_num;
+   chan = priv->txch;
+   handler = cpsw_tx_handler;
+   poll = cpsw_tx_poll;
+   }
+
+   while (*ch < ch_num) {
+   chan[*ch] = cpdma_chan_create(priv->dma, *ch, handler, rx);
+
+   if (IS_ERR(chan[*ch]))
+   return PTR_ERR(chan[*ch]);
+
+   if (!chan[*ch])
+   return -EINVAL;
+
+   dev_info(priv->dev, "created new %d %s channel\n", *ch,
+(rx ? "rx" : "tx"));
+   (*ch)++;
+   }
+
+   while (*ch > ch_num) {
+   int tch = *ch - 1;
+
+   ret = cpdma_chan_destroy(chan[tch]);
+   if (ret)
+   return ret;
+
+   dev_info(priv->dev, "destroyed %d %s channel\n", tch,
+(rx ? "rx" : "tx"));
+   (*ch)--;
+   }
+
+   return 0;
+}
+
+static int cpsw_update_channels(struct net_device *dev,
+   struct ethtool_channels *ch)
+{
+   struct cpsw_priv *priv;
+   int ret;
+
+   priv = netdev_priv(dev);
+
+   ret = cpsw_update_channels_res(priv, ch->rx_count, 1);
+   if (ret)
+   return ret;
+
+   ret = cpsw_update_channels_res(priv, ch->tx_count, 0);
+   if (ret)
+   return ret;
+
+   if (priv->data.dual_emac) {
+   int i;
+   /* mirror channels for another SL */
+   for (i = 0; i < priv->data.slaves; i++) {
+   if (priv->slaves[i].ndev == dev)
+   continue;
+
+   cpsw_sync_dual_ch_list(dev, priv->slaves[i].ndev);
+   }
+   }
+
+   return 0;
+}
+
+static int cpsw_set_channels(struct net_device *ndev,
+struct ethtool_channels *chs)
+{
+   struct 

[PATCH 1/4] net: ethernet: ti: davinci_cpdma: split descs num between all channels

2016-06-30 Thread Ivan Khoronzhuk
Currently the tx channels are using the same pool of descriptors.
Thus one channel can block another if pool is emptied by one.
But, the shaper should decide which channel is allowed to send
packets. To avoid such impact of one channel on another let every
channel to have its own peace of pool.

Signed-off-by: Ivan Khoronzhuk 
---
 drivers/net/ethernet/ti/cpsw.c  | 59 +
 drivers/net/ethernet/ti/davinci_cpdma.c | 54 --
 drivers/net/ethernet/ti/davinci_cpdma.h |  2 +-
 3 files changed, 89 insertions(+), 26 deletions(-)

diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index 1a93a1f..a713336 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -1230,6 +1230,39 @@ static void cpsw_init_host_port(struct cpsw_priv *priv)
}
 }
 
+static int cpsw_fill_rx_channels(struct net_device *ndev)
+{
+   struct cpsw_priv *priv = netdev_priv(ndev);
+   struct sk_buff *skb;
+   int ch_buf_num;
+   int i, ret;
+
+   ch_buf_num = cpdma_chan_get_rx_buf_num(priv->rxch);
+   for (i = 0; i < ch_buf_num; i++) {
+   skb = __netdev_alloc_skb_ip_align(ndev,
+ priv->rx_packet_max,
+ GFP_KERNEL);
+   if (!skb) {
+   dev_err(priv->dev, "cannot allocate skb\n");
+   return -ENOMEM;
+   }
+
+   ret = cpdma_chan_submit(priv->rxch, skb, skb->data,
+   skb_tailroom(skb), 0);
+   if (ret < 0) {
+   dev_err(priv->dev,
+   "cannot submit skb to rx channel, error %d\n",
+   ret);
+   kfree_skb(skb);
+   return ret;
+   }
+   }
+
+   cpsw_info(priv, ifup, "submitted %d rx descriptors\n", ch_buf_num);
+
+   return ch_buf_num;
+}
+
 static void cpsw_slave_stop(struct cpsw_slave *slave, struct cpsw_priv *priv)
 {
u32 slave_port;
@@ -1249,7 +1282,7 @@ static void cpsw_slave_stop(struct cpsw_slave *slave, 
struct cpsw_priv *priv)
 static int cpsw_ndo_open(struct net_device *ndev)
 {
struct cpsw_priv *priv = netdev_priv(ndev);
-   int i, ret;
+   int ret;
u32 reg;
 
ret = pm_runtime_get_sync(>pdev->dev);
@@ -1282,7 +1315,6 @@ static int cpsw_ndo_open(struct net_device *ndev)
 
if (!cpsw_common_res_usage_state(priv)) {
struct cpsw_priv *priv_sl0 = cpsw_get_slave_priv(priv, 0);
-   int buf_num;
 
/* setup tx dma to fixed prio and zero offset */
cpdma_control_set(priv->dma, CPDMA_TX_PRIO_FIXED, 1);
@@ -1310,26 +1342,9 @@ static int cpsw_ndo_open(struct net_device *ndev)
enable_irq(priv->irqs_table[0]);
}
 
-   buf_num = cpdma_chan_get_rx_buf_num(priv->dma);
-   for (i = 0; i < buf_num; i++) {
-   struct sk_buff *skb;
-
-   ret = -ENOMEM;
-   skb = __netdev_alloc_skb_ip_align(priv->ndev,
-   priv->rx_packet_max, GFP_KERNEL);
-   if (!skb)
-   goto err_cleanup;
-   ret = cpdma_chan_submit(priv->rxch, skb, skb->data,
-   skb_tailroom(skb), 0);
-   if (ret < 0) {
-   kfree_skb(skb);
-   goto err_cleanup;
-   }
-   }
-   /* continue even if we didn't manage to submit all
-* receive descs
-*/
-   cpsw_info(priv, ifup, "submitted %d rx descriptors\n", i);
+   ret = cpsw_fill_rx_channels(ndev);
+   if (ret < 0)
+   goto err_cleanup;
 
if (cpts_register(>pdev->dev, priv->cpts,
  priv->data.cpts_clock_mult,
diff --git a/drivers/net/ethernet/ti/davinci_cpdma.c 
b/drivers/net/ethernet/ti/davinci_cpdma.c
index 1c653ca..2f4b571 100644
--- a/drivers/net/ethernet/ti/davinci_cpdma.c
+++ b/drivers/net/ethernet/ti/davinci_cpdma.c
@@ -106,6 +106,7 @@ struct cpdma_ctlr {
struct cpdma_desc_pool  *pool;
spinlock_t  lock;
struct cpdma_chan   *channels[2 * CPDMA_MAX_CHANNELS];
+   int chan_num;
 };
 
 struct cpdma_chan {
@@ -262,6 +263,7 @@ struct cpdma_ctlr *cpdma_ctlr_create(struct cpdma_params 
*params)
ctlr->state = CPDMA_STATE_IDLE;
ctlr->params = *params;
ctlr->dev = params->dev;
+   ctlr->chan_num = 0;
spin_lock_init(>lock);
 
ctlr->pool = cpdma_desc_pool_create(ctlr->dev,
@@ -479,6 +481,32 @@ void cpdma_ctlr_eoi(struct 

[PATCH 0/4] net: ethernet: ti: cpsw: add multi-queue support

2016-06-30 Thread Ivan Khoronzhuk
This series is intended to allow cpsw driver to use its ability of h/w
shaper to send/receive data with up to 8 tx and rx queues. This series
doesn't contain interface to configure h/w shaper itself, it contains
only multi queue support part and ability to configure number of tx/rx
queues with ethtool. Default shaper mode - priority mode. The h/w
shaper configuration will be added with separate patch series.
This series doesn't affect on net throughput.

Tested on:
am572x-idk, 1Gbps link
am335-boneblack, 100Mbps link.

Based on:
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git

A simple example for splitting traffic on queues:

#check how many queues are supported and active:
$ ethtool -l eth0

#increase number of active rx and tx queues,
#by default 1 rx and 1 tx queue
#can be set any combination of 0 < rx <= 8 and 0 < tx <= 8
$ ethtool -L eth0 rx 8 tx 8

#set multi-queue-aware queuing discipline
$ tc qdisc add dev eth0 root handle 1: multiq

#send packets with ip 172.22.39.12 to queue #5 which can be
#prioritized or throughput limited by h/w shaper.
$ tc filter add dev eth0 parent 1: protocol ip prio 1 u32 \
match ip dst 172.22.39.12 \
action skbedit queue_mapping 5

#get statistic for active channels:
ethtool -S eth0

Ivan Khoronzhuk (4):
  net: ethernet: ti: davinci_cpdma: split descs num between all channels
  net: ethernet: ti: cpsw: add multi queue support
  net: ethernet: ti: davinci_cpdma: move cpdma channel struct macroses
to internals
  net: ethernet: ti: cpsw: add ethtool channels support

 drivers/net/ethernet/ti/cpsw.c  | 533 +---
 drivers/net/ethernet/ti/davinci_cpdma.c |  79 -
 drivers/net/ethernet/ti/davinci_cpdma.h |  13 +-
 drivers/net/ethernet/ti/davinci_emac.c  |   8 +-
 4 files changed, 505 insertions(+), 128 deletions(-)

-- 
1.9.1



Re: [PATCH net] net: bcmsysport: Device stats are unsigned long

2016-06-30 Thread Andrew Lunn
On Thu, Jun 30, 2016 at 10:56:29AM -0700, Florian Fainelli wrote:
> On 64bits kernels, device stats are 64bits wide, not 32bits.
> 
> Fixes: 80105befdb4b ("net: systemport: add Broadcom SYSTEMPORT Ethernet MAC 
> driver")
> Signed-off-by: Florian Fainelli 
> ---
>  drivers/net/ethernet/broadcom/bcmsysport.c | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/broadcom/bcmsysport.c 
> b/drivers/net/ethernet/broadcom/bcmsysport.c
> index 543bf38105c9..21f21e23e695 100644
> --- a/drivers/net/ethernet/broadcom/bcmsysport.c
> +++ b/drivers/net/ethernet/broadcom/bcmsysport.c
> @@ -392,7 +392,11 @@ static void bcm_sysport_get_stats(struct net_device *dev,
>   else
>   p = (char *)priv;
>   p += s->stat_offset;
> - data[i] = *(u32 *)p;

Hi Florian

Could you not just change this cast from u32 to unsigned long and be
done?

160 struct net_device_stats {
161 unsigned long   rx_packets;
162 unsigned long   tx_packets;

Andrew


Re: ethtool needs a new maintainer

2016-06-30 Thread John W. Linville
On Mon, Jun 27, 2016 at 09:51:47AM -0400, John W. Linville wrote:
> On Sun, Jun 26, 2016 at 06:11:41PM +0200, Ben Hutchings wrote:
> > I've become steadily less enthusiastic and less responsive as a
> > maintainer over the past year or so.  I no longer work on networking
> > regularly, so it takes a lot more time to get into the right state of
> > mind to think about ethtool code, while I have other demands on my time
> > that tend to take priority.
> > 
> > So, I would like to find a new maintainer to take over as soon as
> > possible.  Ideally the new maintainer would have previous contributions
> > to ethtool and an existing account on kernel.org so that they can push
> > to the git repository and the home page.  But neither of those is
> > essential.  Please reply if you're interested.
> 
> I would like to take this responsibility. My previous contributions
> to ethtool are meager, but I think my skills and interests are suited
> to the task.  Plus, I already have a kernel.org account... :-)

Are there any other takers?  Or is this a done deal?

John
-- 
John W. LinvilleSomeday the world will need a hero, and you
linvi...@tuxdriver.com  might be all we have.  Be ready.


Re: [RFC 6/7] dt-bindings: net: bgmac: add bindings documentation for bgmac

2016-06-30 Thread Ray Jui

Hi Jon,

On 6/28/2016 12:34 PM, Jon Mason wrote:

Signed-off-by: Jon Mason 
---
 .../devicetree/bindings/net/brcm,bgmac-enet.txt | 21 +
 1 file changed, 21 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/net/brcm,bgmac-enet.txt

diff --git a/Documentation/devicetree/bindings/net/brcm,bgmac-enet.txt 
b/Documentation/devicetree/bindings/net/brcm,bgmac-enet.txt
new file mode 100644
index 000..efd36d5
--- /dev/null
+++ b/Documentation/devicetree/bindings/net/brcm,bgmac-enet.txt
@@ -0,0 +1,21 @@
+Broadcom GMAC Ethernet Controller Device Tree Bindings
+-
+
+Required properties:
+ - compatible: "brcm,bgmac-enet"
+ - reg:Address and length of the GMAC registers,
+   Address and length of the GMAC IDM registers


As we know there will be additional optional register banks required for 
some of the other SoCs that the current driver has not yet supported. In 
my opinion, we should consider to make "reg-names" a mandatory property 
now and map the register blocks based on names.


I think this will help to make our life easier in the future when new 
optional SoC specific register blocks are added, such that we can map 
the register blocks based on names instead of indices, which will change 
and be different among different SoCs and will require much more complex 
logic in the driver to deal with.



+ - interrupts: Interrupt number
+
+Optional properties:
+- mac-address: mac address to be assigned to the device
+
+Examples:
+
+gmac0: enet@18022000 {
+   compatible = "brcm,bgmac-enet";
+   reg = <0x18022000 0x1000>,
+ <0x1811 0x1000>;
+   interrupts = ;
+   status = "disabled";
+};



Btw, I think Rob Herring should be included in the review for device 
tree binding document changes.


Thanks,

Ray


Re: [RFC 5/7] net: ethernet: bgmac: Add platform device support

2016-06-30 Thread Ray Jui

Hi Jon,

On 6/28/2016 12:34 PM, Jon Mason wrote:

The bcma portion of the driver has been split off into a bcma specific
driver.  This has been mirrored for the platform driver.  The last
references to the bcma core struct have been changed into a generic
function call.  These function calls are wrappers to either the original
bcma code or new platform functions that access the same areas via MMIO.
This necessitated adding function pointers for both platform and bcma to
hide which backend is being used from the generic bgmac code.

Signed-off-by: Jon Mason 
---
 drivers/net/ethernet/broadcom/Kconfig  |  23 +-
 drivers/net/ethernet/broadcom/Makefile |   4 +-
 drivers/net/ethernet/broadcom/bgmac-bcma.c | 315 
 drivers/net/ethernet/broadcom/bgmac-platform.c | 208 
 drivers/net/ethernet/broadcom/bgmac.c  | 327 -
 drivers/net/ethernet/broadcom/bgmac.h  |  73 +-
 6 files changed, 666 insertions(+), 284 deletions(-)
 create mode 100644 drivers/net/ethernet/broadcom/bgmac-bcma.c
 create mode 100644 drivers/net/ethernet/broadcom/bgmac-platform.c

diff --git a/drivers/net/ethernet/broadcom/Kconfig 
b/drivers/net/ethernet/broadcom/Kconfig
index d74a92e..bd8c80c 100644
--- a/drivers/net/ethernet/broadcom/Kconfig
+++ b/drivers/net/ethernet/broadcom/Kconfig
@@ -140,10 +140,18 @@ config BNX2X_SRIOV
  allows for virtual function acceleration in virtual environments.

 config BGMAC
-   tristate "BCMA bus GBit core support"
+   tristate
+   help
+ This enables the integrated ethernet controller support for many
+ Broadcom (mostly iProc) SoCs. An appropriate bus interface driver
+ needs to be enabled to select this.
+
+config BGMAC_BCMA
+   tristate "Broadcom iProc GBit BCMA support"
depends on BCMA && BCMA_HOST_SOC
depends on HAS_DMA
depends on BCM47XX || ARCH_BCM_5301X || COMPILE_TEST
+   select BGMAC
select PHYLIB
select FIXED_PHY
---help---
@@ -152,6 +160,19 @@ config BGMAC
  In case of using this driver on BCM4706 it's also requires to enable
  BCMA_DRIVER_GMAC_CMN to make it work.

+config BGMAC_PLATFORM
+   tristate "Broadcom iProc GBit platform support"
+   depends on HAS_DMA
+   depends on ARCH_BCM_IPROC || COMPILE_TEST
+   depends on OF
+   select BGMAC
+   select PHYLIB
+   select FIXED_PHY
+   default ARCH_BCM_IPROC
+   ---help---
+ Say Y here if you want to use the Broadcom iProc Gigabit Ethernet
+ controller through the generic platform interface
+
 config SYSTEMPORT
tristate "Broadcom SYSTEMPORT internal MAC support"
depends on OF
diff --git a/drivers/net/ethernet/broadcom/Makefile 
b/drivers/net/ethernet/broadcom/Makefile
index f559794..79f2372 100644
--- a/drivers/net/ethernet/broadcom/Makefile
+++ b/drivers/net/ethernet/broadcom/Makefile
@@ -10,6 +10,8 @@ obj-$(CONFIG_CNIC) += cnic.o
 obj-$(CONFIG_BNX2X) += bnx2x/
 obj-$(CONFIG_SB1250_MAC) += sb1250-mac.o
 obj-$(CONFIG_TIGON3) += tg3.o
-obj-$(CONFIG_BGMAC) += bgmac.o bgmac-bcma-mdio.o
+obj-$(CONFIG_BGMAC) += bgmac.o
+obj-$(CONFIG_BGMAC_BCMA) += bgmac-bcma.o bgmac-bcma-mdio.o
+obj-$(CONFIG_BGMAC_PLATFORM) += bgmac-platform.o
 obj-$(CONFIG_SYSTEMPORT) += bcmsysport.o
 obj-$(CONFIG_BNXT) += bnxt/
diff --git a/drivers/net/ethernet/broadcom/bgmac-bcma.c 
b/drivers/net/ethernet/broadcom/bgmac-bcma.c
new file mode 100644
index 000..9a9745c4
--- /dev/null
+++ b/drivers/net/ethernet/broadcom/bgmac-bcma.c
@@ -0,0 +1,315 @@
+/*
+ * Driver for (BCM4706)? GBit MAC core on BCMA bus.
+ *
+ * Copyright (C) 2012 Rafał Miłecki 
+ *
+ * Licensed under the GNU/GPL. See COPYING for details.
+ */
+
+#define pr_fmt(fmt)KBUILD_MODNAME ": " fmt
+
+#include 
+#include 
+#include 
+#include "bgmac.h"
+
+static inline bool bgmac_is_bcm4707_family(struct bcma_device *core)
+{
+   switch (core->bus->chipinfo.id) {
+   case BCMA_CHIP_ID_BCM4707:
+   case BCMA_CHIP_ID_BCM47094:
+   case BCMA_CHIP_ID_BCM53018:
+   return true;
+   default:
+   return false;
+   }
+}
+
+/**
+ * BCMA bus ops
+ **/
+
+static u32 bcma_bgmac_read(struct bgmac *bgmac, u16 offset)
+{
+   return bcma_read32(bgmac->bcma.core, offset);
+}
+
+static void bcma_bgmac_write(struct bgmac *bgmac, u16 offset, u32 value)
+{
+   bcma_write32(bgmac->bcma.core, offset, value);
+}
+
+static u32 bcma_bgmac_idm_read(struct bgmac *bgmac, u16 offset)
+{
+   return bcma_aread32(bgmac->bcma.core, offset);
+}
+
+static void bcma_bgmac_idm_write(struct bgmac *bgmac, u16 offset, u32 value)
+{
+   return bcma_awrite32(bgmac->bcma.core, offset, value);
+}
+
+static bool bcma_bgmac_clk_enabled(struct bgmac *bgmac)
+{
+   return 

[PATCH net] net: bcmsysport: Device stats are unsigned long

2016-06-30 Thread Florian Fainelli
On 64bits kernels, device stats are 64bits wide, not 32bits.

Fixes: 80105befdb4b ("net: systemport: add Broadcom SYSTEMPORT Ethernet MAC 
driver")
Signed-off-by: Florian Fainelli 
---
 drivers/net/ethernet/broadcom/bcmsysport.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/broadcom/bcmsysport.c 
b/drivers/net/ethernet/broadcom/bcmsysport.c
index 543bf38105c9..21f21e23e695 100644
--- a/drivers/net/ethernet/broadcom/bcmsysport.c
+++ b/drivers/net/ethernet/broadcom/bcmsysport.c
@@ -392,7 +392,11 @@ static void bcm_sysport_get_stats(struct net_device *dev,
else
p = (char *)priv;
p += s->stat_offset;
-   data[i] = *(u32 *)p;
+   if (sizeof(unsigned long) != sizeof(u32) &&
+   s->stat_sizeof == sizeof(unsigned long))
+   data[i] = *(unsigned long *)p;
+   else
+   data[i] = *(u32 *)p;
}
 }
 
-- 
2.7.4



[PATCH] wcn36xx: Implement print_reg indication

2016-06-30 Thread Bjorn Andersson
Some firmware versions sends a "print register indication", handle this
by printing out the content.

Cc: Nicolas Dechesne 
Signed-off-by: Bjorn Andersson 
---
 drivers/net/wireless/ath/wcn36xx/hal.h | 16 
 drivers/net/wireless/ath/wcn36xx/smd.c | 30 ++
 2 files changed, 46 insertions(+)

diff --git a/drivers/net/wireless/ath/wcn36xx/hal.h 
b/drivers/net/wireless/ath/wcn36xx/hal.h
index 4f87ef1e1eb8..b765c647319d 100644
--- a/drivers/net/wireless/ath/wcn36xx/hal.h
+++ b/drivers/net/wireless/ath/wcn36xx/hal.h
@@ -350,6 +350,8 @@ enum wcn36xx_hal_host_msg_type {
 
WCN36XX_HAL_AVOID_FREQ_RANGE_IND = 233,
 
+   WCN36XX_HAL_PRINT_REG_INFO_IND = 259,
+
WCN36XX_HAL_MSG_MAX = WCN36XX_HAL_MSG_TYPE_MAX_ENUM_SIZE
 };
 
@@ -4703,4 +4705,18 @@ struct stats_class_b_ind {
u32 rx_time_total;
 };
 
+/* WCN36XX_HAL_PRINT_REG_INFO_IND */
+struct wcn36xx_hal_print_reg_info_ind {
+   struct wcn36xx_hal_msg_header header;
+
+   u32 count;
+   u32 scenario;
+   u32 reason;
+
+   struct {
+   u32 addr;
+   u32 value;
+   } regs[];
+} __packed;
+
 #endif /* _HAL_H_ */
diff --git a/drivers/net/wireless/ath/wcn36xx/smd.c 
b/drivers/net/wireless/ath/wcn36xx/smd.c
index 87a62eb6228c..28d6ca0ca819 100644
--- a/drivers/net/wireless/ath/wcn36xx/smd.c
+++ b/drivers/net/wireless/ath/wcn36xx/smd.c
@@ -2109,6 +2109,30 @@ static int wcn36xx_smd_delete_sta_context_ind(struct 
wcn36xx *wcn,
return -ENOENT;
 }
 
+static int wcn36xx_smd_print_reg_info_ind(struct wcn36xx *wcn,
+ void *buf,
+ size_t len)
+{
+   struct wcn36xx_hal_print_reg_info_ind *rsp = buf;
+   int i;
+
+   if (len < sizeof(*rsp)) {
+   wcn36xx_warn("Corrupted print reg info indication\n");
+   return -EIO;
+   }
+
+   wcn36xx_dbg(WCN36XX_DBG_HAL,
+   "reginfo indication, scenario: 0x%x reason: 0x%x\n",
+   rsp->scenario, rsp->reason);
+
+   for (i = 0; i < rsp->count; i++) {
+   wcn36xx_dbg(WCN36XX_DBG_HAL, "\t0x%x: 0x%x\n",
+   rsp->regs[i].addr, rsp->regs[i].value);
+   }
+
+   return 0;
+}
+
 int wcn36xx_smd_update_cfg(struct wcn36xx *wcn, u32 cfg_id, u32 value)
 {
struct wcn36xx_hal_update_cfg_req_msg msg_body, *body;
@@ -2238,6 +2262,7 @@ int wcn36xx_smd_rsp_process(struct qcom_smd_channel 
*channel,
case WCN36XX_HAL_OTA_TX_COMPL_IND:
case WCN36XX_HAL_MISSED_BEACON_IND:
case WCN36XX_HAL_DELETE_STA_CONTEXT_IND:
+   case WCN36XX_HAL_PRINT_REG_INFO_IND:
msg_ind = kmalloc(sizeof(*msg_ind) + len, GFP_ATOMIC);
if (!msg_ind) {
/*
@@ -2300,6 +2325,11 @@ static void wcn36xx_ind_smd_work(struct work_struct 
*work)
   hal_ind_msg->msg,
   hal_ind_msg->msg_len);
break;
+   case WCN36XX_HAL_PRINT_REG_INFO_IND:
+   wcn36xx_smd_print_reg_info_ind(wcn,
+  hal_ind_msg->msg,
+  hal_ind_msg->msg_len);
+   break;
default:
wcn36xx_err("SMD_EVENT (%d) not supported\n",
  msg_header->msg_type);
-- 
2.5.0



[PATCH net-next v3 4/4] cgroup: bpf: Add an example to do cgroup checking in BPF

2016-06-30 Thread Martin KaFai Lau
test_cgrp2_array_pin.c:
A userland program that creates a bpf_map (BPF_MAP_TYPE_GROUP_ARRAY),
pouplates/updates it with a cgroup2's backed fd and pins it to a
bpf-fs's file.  The pinned file can be loaded by tc and then used
by the bpf prog later.  This program can also update an existing pinned
array and it could be useful for debugging/testing purpose.

test_cgrp2_tc_kern.c:
A bpf prog which should be loaded by tc.  It is to demonstrate
the usage of bpf_skb_in_cgroup.

test_cgrp2_tc.sh:
A script that glues the test_cgrp2_array_pin.c and
test_cgrp2_tc_kern.c together.  The idea is like:
1. Load the test_cgrp2_tc_kern.o by tc
2. Use test_cgrp2_array_pin.c to populate a BPF_MAP_TYPE_CGROUP_ARRAY
   with a cgroup fd
3. Do a 'ping -6 ff02::1%ve' to ensure the packet has been
   dropped because of a match on the cgroup

Most of the lines in test_cgrp2_tc.sh is the boilerplate
to setup the cgroup/bpf-fs/net-devices/netns...etc.  It is
not bulletproof on errors but should work well enough and
give enough debug info if things did not go well.

Signed-off-by: Martin KaFai Lau 
Cc: Alexei Starovoitov 
Cc: Daniel Borkmann 
Cc: Tejun Heo 
Acked-by: Alexei Starovoitov 
---
 samples/bpf/Makefile   |   3 +
 samples/bpf/bpf_helpers.h  |   2 +
 samples/bpf/test_cgrp2_array_pin.c | 109 ++
 samples/bpf/test_cgrp2_tc.sh   | 184 +
 samples/bpf/test_cgrp2_tc_kern.c   |  69 ++
 5 files changed, 367 insertions(+)
 create mode 100644 samples/bpf/test_cgrp2_array_pin.c
 create mode 100755 samples/bpf/test_cgrp2_tc.sh
 create mode 100644 samples/bpf/test_cgrp2_tc_kern.c

diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index 0bf2478..a98b780 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -20,6 +20,7 @@ hostprogs-y += offwaketime
 hostprogs-y += spintest
 hostprogs-y += map_perf_test
 hostprogs-y += test_overhead
+hostprogs-y += test_cgrp2_array_pin
 
 test_verifier-objs := test_verifier.o libbpf.o
 test_maps-objs := test_maps.o libbpf.o
@@ -40,6 +41,7 @@ offwaketime-objs := bpf_load.o libbpf.o offwaketime_user.o
 spintest-objs := bpf_load.o libbpf.o spintest_user.o
 map_perf_test-objs := bpf_load.o libbpf.o map_perf_test_user.o
 test_overhead-objs := bpf_load.o libbpf.o test_overhead_user.o
+test_cgrp2_array_pin-objs := libbpf.o test_cgrp2_array_pin.o
 
 # Tell kbuild to always build the programs
 always := $(hostprogs-y)
@@ -61,6 +63,7 @@ always += map_perf_test_kern.o
 always += test_overhead_tp_kern.o
 always += test_overhead_kprobe_kern.o
 always += parse_varlen.o parse_simple.o parse_ldabs.o
+always += test_cgrp2_tc_kern.o
 
 HOSTCFLAGS += -I$(objtree)/usr/include
 
diff --git a/samples/bpf/bpf_helpers.h b/samples/bpf/bpf_helpers.h
index 7904a2a..84e3fd9 100644
--- a/samples/bpf/bpf_helpers.h
+++ b/samples/bpf/bpf_helpers.h
@@ -70,6 +70,8 @@ static int (*bpf_l3_csum_replace)(void *ctx, int off, int 
from, int to, int flag
(void *) BPF_FUNC_l3_csum_replace;
 static int (*bpf_l4_csum_replace)(void *ctx, int off, int from, int to, int 
flags) =
(void *) BPF_FUNC_l4_csum_replace;
+static int (*bpf_skb_in_cgroup)(void *ctx, void *map, int index) =
+   (void *) BPF_FUNC_skb_in_cgroup;
 
 #if defined(__x86_64__)
 
diff --git a/samples/bpf/test_cgrp2_array_pin.c 
b/samples/bpf/test_cgrp2_array_pin.c
new file mode 100644
index 000..70e86f7
--- /dev/null
+++ b/samples/bpf/test_cgrp2_array_pin.c
@@ -0,0 +1,109 @@
+/* Copyright (c) 2016 Facebook
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ */
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "libbpf.h"
+
+static void usage(void)
+{
+   printf("Usage: test_cgrp2_array_pin [...]\n");
+   printf("   -FFile to pin an BPF cgroup array\n");
+   printf("   -UUpdate an already pinned BPF cgroup 
array\n");
+   printf("   -v   Full path of the cgroup2\n");
+   printf("   -h  Display this help\n");
+}
+
+int main(int argc, char **argv)
+{
+   const char *pinned_file = NULL, *cg2 = NULL;
+   int create_array = 1;
+   int array_key = 0;
+   int array_fd = -1;
+   int cg2_fd = -1;
+   int ret = -1;
+   int opt;
+
+   while ((opt = getopt(argc, argv, "F:U:v:")) != -1) {
+   switch (opt) {
+   /* General args */
+   case 'F':
+   pinned_file = optarg;
+   break;
+   case 'U':
+   pinned_file = optarg;
+   create_array = 0;
+   break;
+   case 'v':
+   cg2 = optarg;
+   break;
+   default:
+ 

[PATCH net-next v3 3/4] cgroup: bpf: Add bpf_skb_in_cgroup_proto

2016-06-30 Thread Martin KaFai Lau
Adds a bpf helper, bpf_skb_in_cgroup, to decide if a skb->sk
belongs to a descendant of a cgroup2.  It is similar to the
feature added in netfilter:
commit c38c4597e4bf ("netfilter: implement xt_cgroup cgroup2 path match")

The user is expected to populate a BPF_MAP_TYPE_CGROUP_ARRAY
which will be used by the bpf_skb_in_cgroup.

Modifications to the bpf verifier is to ensure BPF_MAP_TYPE_CGROUP_ARRAY
and bpf_skb_in_cgroup() are always used together.

Signed-off-by: Martin KaFai Lau 
Cc: Alexei Starovoitov 
Cc: Daniel Borkmann 
Cc: Tejun Heo 
Acked-by: Alexei Starovoitov 
---
 include/uapi/linux/bpf.h | 12 
 kernel/bpf/verifier.c|  8 +++-
 net/core/filter.c| 38 ++
 3 files changed, 57 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index ef4e386..bad309f 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -314,6 +314,18 @@ enum bpf_func_id {
 */
BPF_FUNC_skb_get_tunnel_opt,
BPF_FUNC_skb_set_tunnel_opt,
+
+   /**
+* bpf_skb_in_cgroup(skb, map, index) - Check cgroup2 membership of skb
+* @skb: pointer to skb
+* @map: pointer to bpf_map in BPF_MAP_TYPE_CGROUP_ARRAY type
+* @index: index of the cgroup in the bpf_map
+* Return:
+*   == 0 skb failed the cgroup2 descendant test
+*   == 1 skb succeeded the cgroup2 descendant test
+*< 0 error
+*/
+   BPF_FUNC_skb_in_cgroup,
__BPF_FUNC_MAX_ID,
 };
 
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 0f6db58..68753e0 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -1063,7 +1063,9 @@ static int check_map_func_compatibility(struct bpf_map 
*map, int func_id)
goto error;
break;
case BPF_MAP_TYPE_CGROUP_ARRAY:
-   goto error;
+   if (func_id != BPF_FUNC_skb_in_cgroup)
+   goto error;
+   break;
default:
break;
}
@@ -1083,6 +1085,10 @@ static int check_map_func_compatibility(struct bpf_map 
*map, int func_id)
if (map->map_type != BPF_MAP_TYPE_STACK_TRACE)
goto error;
break;
+   case BPF_FUNC_skb_in_cgroup:
+   if (map->map_type != BPF_MAP_TYPE_CGROUP_ARRAY)
+   goto error;
+   break;
default:
break;
}
diff --git a/net/core/filter.c b/net/core/filter.c
index df6860c..8134c98 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -2024,6 +2024,40 @@ bpf_get_skb_set_tunnel_proto(enum bpf_func_id which)
}
 }
 
+#ifdef CONFIG_SOCK_CGROUP_DATA
+static u64 bpf_skb_in_cgroup(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5)
+{
+   struct sk_buff *skb = (struct sk_buff *)(long)r1;
+   struct bpf_map *map = (struct bpf_map *)(long)r2;
+   struct bpf_array *array = container_of(map, struct bpf_array, map);
+   struct cgroup *cgrp;
+   struct sock *sk;
+   u32 i = (u32)r3;
+
+   sk = skb->sk;
+   if (!sk || !sk_fullsock(sk))
+   return -ENOENT;
+
+   if (unlikely(i >= array->map.max_entries))
+   return -E2BIG;
+
+   cgrp = READ_ONCE(array->ptrs[i]);
+   if (unlikely(!cgrp))
+   return -EAGAIN;
+
+   return cgroup_is_descendant(sock_cgroup_ptr(>sk_cgrp_data), cgrp);
+}
+
+static const struct bpf_func_proto bpf_skb_in_cgroup_proto = {
+   .func   = bpf_skb_in_cgroup,
+   .gpl_only   = false,
+   .ret_type   = RET_INTEGER,
+   .arg1_type  = ARG_PTR_TO_CTX,
+   .arg2_type  = ARG_CONST_MAP_PTR,
+   .arg3_type  = ARG_ANYTHING,
+};
+#endif
+
 static const struct bpf_func_proto *
 sk_filter_func_proto(enum bpf_func_id func_id)
 {
@@ -2086,6 +2120,10 @@ tc_cls_act_func_proto(enum bpf_func_id func_id)
return _get_route_realm_proto;
case BPF_FUNC_perf_event_output:
return bpf_get_event_output_proto();
+#ifdef CONFIG_SOCK_CGROUP_DATA
+   case BPF_FUNC_skb_in_cgroup:
+   return _skb_in_cgroup_proto;
+#endif
default:
return sk_filter_func_proto(func_id);
}
-- 
2.5.1



[PATCH net-next v3 2/4] cgroup: bpf: Add BPF_MAP_TYPE_CGROUP_ARRAY

2016-06-30 Thread Martin KaFai Lau
Add a BPF_MAP_TYPE_CGROUP_ARRAY and its bpf_map_ops's implementations.
To update an element, the caller is expected to obtain a cgroup2 backed
fd by open(cgroup2_dir) and then update the array with that fd.

Signed-off-by: Martin KaFai Lau 
Cc: Alexei Starovoitov 
Cc: Daniel Borkmann 
Cc: Tejun Heo 
Acked-by: Alexei Starovoitov 
---
 include/uapi/linux/bpf.h |  1 +
 kernel/bpf/arraymap.c| 43 +++
 kernel/bpf/syscall.c |  3 ++-
 kernel/bpf/verifier.c|  2 ++
 4 files changed, 48 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 406459b..ef4e386 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -84,6 +84,7 @@ enum bpf_map_type {
BPF_MAP_TYPE_PERCPU_HASH,
BPF_MAP_TYPE_PERCPU_ARRAY,
BPF_MAP_TYPE_STACK_TRACE,
+   BPF_MAP_TYPE_CGROUP_ARRAY,
 };
 
 enum bpf_prog_type {
diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index 5af3073..588d66e 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -539,3 +539,46 @@ static int __init register_perf_event_array_map(void)
return 0;
 }
 late_initcall(register_perf_event_array_map);
+
+#ifdef CONFIG_SOCK_CGROUP_DATA
+static void *cgroup_fd_array_get_ptr(struct bpf_map *map,
+struct file *map_file /* not used */,
+int fd)
+{
+   return cgroup_get_from_fd(fd);
+}
+
+static void cgroup_fd_array_put_ptr(void *ptr)
+{
+   /* cgroup_put free cgrp after a rcu grace period */
+   cgroup_put(ptr);
+}
+
+static void cgroup_fd_array_free(struct bpf_map *map)
+{
+   bpf_fd_array_map_clear(map);
+   fd_array_map_free(map);
+}
+
+static const struct bpf_map_ops cgroup_array_ops = {
+   .map_alloc = fd_array_map_alloc,
+   .map_free = cgroup_fd_array_free,
+   .map_get_next_key = array_map_get_next_key,
+   .map_lookup_elem = fd_array_map_lookup_elem,
+   .map_delete_elem = fd_array_map_delete_elem,
+   .map_fd_get_ptr = cgroup_fd_array_get_ptr,
+   .map_fd_put_ptr = cgroup_fd_array_put_ptr,
+};
+
+static struct bpf_map_type_list cgroup_array_type __read_mostly = {
+   .ops = _array_ops,
+   .type = BPF_MAP_TYPE_CGROUP_ARRAY,
+};
+
+static int __init register_cgroup_array_map(void)
+{
+   bpf_register_map_type(_array_type);
+   return 0;
+}
+late_initcall(register_cgroup_array_map);
+#endif
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index c23a4e93..cac13f1 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -393,7 +393,8 @@ static int map_update_elem(union bpf_attr *attr)
} else if (map->map_type == BPF_MAP_TYPE_PERCPU_ARRAY) {
err = bpf_percpu_array_update(map, key, value, attr->flags);
} else if (map->map_type == BPF_MAP_TYPE_PERF_EVENT_ARRAY ||
-  map->map_type == BPF_MAP_TYPE_PROG_ARRAY) {
+  map->map_type == BPF_MAP_TYPE_PROG_ARRAY ||
+  map->map_type == BPF_MAP_TYPE_CGROUP_ARRAY) {
rcu_read_lock();
err = bpf_fd_array_map_update_elem(map, f.file, key, value,
   attr->flags);
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 668e079..0f6db58 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -1062,6 +1062,8 @@ static int check_map_func_compatibility(struct bpf_map 
*map, int func_id)
if (func_id != BPF_FUNC_get_stackid)
goto error;
break;
+   case BPF_MAP_TYPE_CGROUP_ARRAY:
+   goto error;
default:
break;
}
-- 
2.5.1



[Patch net] net_sched: fix mirrored packets checksum

2016-06-30 Thread Cong Wang
Similar to commit 9b368814b336 ("net: fix bridge multicast packet checksum 
validation")
we need to fixup the checksum for CHECKSUM_COMPLETE when
pushing skb on RX path. Otherwise we get similar splats.

Cc: Jamal Hadi Salim 
Cc: Tom Herbert 
Signed-off-by: Cong Wang 
---
 include/linux/skbuff.h | 19 +++
 net/core/skbuff.c  | 18 --
 net/sched/act_mirred.c |  2 +-
 3 files changed, 20 insertions(+), 19 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index ee38a41..61ab566 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -2870,6 +2870,25 @@ static inline void skb_postpush_rcsum(struct sk_buff 
*skb,
 }
 
 /**
+ * skb_push_rcsum - push skb and update receive checksum
+ * @skb: buffer to update
+ * @len: length of data pulled
+ *
+ * This function performs an skb_push on the packet and updates
+ * the CHECKSUM_COMPLETE checksum.  It should be used on
+ * receive path processing instead of skb_push unless you know
+ * that the checksum difference is zero (e.g., a valid IP header)
+ * or you are setting ip_summed to CHECKSUM_NONE.
+ */
+static inline unsigned char *skb_push_rcsum(struct sk_buff *skb,
+   unsigned int len)
+{
+   skb_push(skb, len);
+   skb_postpush_rcsum(skb, skb->data, len);
+   return skb->data;
+}
+
+/**
  * pskb_trim_rcsum - trim received skb and update checksum
  * @skb: buffer to trim
  * @len: new length
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index f2b77e5..eb12d21 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -3016,24 +3016,6 @@ int skb_append_pagefrags(struct sk_buff *skb, struct 
page *page,
 EXPORT_SYMBOL_GPL(skb_append_pagefrags);
 
 /**
- * skb_push_rcsum - push skb and update receive checksum
- * @skb: buffer to update
- * @len: length of data pulled
- *
- * This function performs an skb_push on the packet and updates
- * the CHECKSUM_COMPLETE checksum.  It should be used on
- * receive path processing instead of skb_push unless you know
- * that the checksum difference is zero (e.g., a valid IP header)
- * or you are setting ip_summed to CHECKSUM_NONE.
- */
-static unsigned char *skb_push_rcsum(struct sk_buff *skb, unsigned len)
-{
-   skb_push(skb, len);
-   skb_postpush_rcsum(skb, skb->data, len);
-   return skb->data;
-}
-
-/**
  * skb_pull_rcsum - pull skb and update receive checksum
  * @skb: buffer to update
  * @len: length of data pulled
diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c
index 128942b..1f5bd6c 100644
--- a/net/sched/act_mirred.c
+++ b/net/sched/act_mirred.c
@@ -181,7 +181,7 @@ static int tcf_mirred(struct sk_buff *skb, const struct 
tc_action *a,
 
if (!(at & AT_EGRESS)) {
if (m->tcfm_ok_push)
-   skb_push(skb2, skb->mac_len);
+   skb_push_rcsum(skb2, skb->mac_len);
}
 
/* mirror is always swallowed */
-- 
2.1.0



[PATCH net-next v3 1/4] cgroup: Add cgroup_get_from_fd

2016-06-30 Thread Martin KaFai Lau
Add a helper function to get a cgroup2 from a fd.  It will be
stored in a bpf array (BPF_MAP_TYPE_CGROUP_ARRAY) which will
be introduced in the later patch.

Signed-off-by: Martin KaFai Lau 
Cc: Alexei Starovoitov 
Cc: Daniel Borkmann 
Cc: Tejun Heo 
Acked-by: Tejun Heo 
---
 include/linux/cgroup.h |  1 +
 kernel/cgroup.c| 35 +++
 2 files changed, 36 insertions(+)

diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index a20320c..984f73b 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -87,6 +87,7 @@ struct cgroup_subsys_state *css_tryget_online_from_dir(struct 
dentry *dentry,
   struct cgroup_subsys 
*ss);
 
 struct cgroup *cgroup_get_from_path(const char *path);
+struct cgroup *cgroup_get_from_fd(int fd);
 
 int cgroup_attach_task_all(struct task_struct *from, struct task_struct *);
 int cgroup_transfer_tasks(struct cgroup *to, struct cgroup *from);
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 86cb5c6..14617968 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -62,6 +62,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 /*
@@ -6205,6 +6206,40 @@ struct cgroup *cgroup_get_from_path(const char *path)
 }
 EXPORT_SYMBOL_GPL(cgroup_get_from_path);
 
+/**
+ * cgroup_get_from_fd - get a cgroup pointer from a fd
+ * @fd: fd obtained by open(cgroup2_dir)
+ *
+ * Find the cgroup from a fd which should be obtained
+ * by opening a cgroup directory.  Returns a pointer to the
+ * cgroup on success. ERR_PTR is returned if the cgroup
+ * cannot be found.
+ */
+struct cgroup *cgroup_get_from_fd(int fd)
+{
+   struct cgroup_subsys_state *css;
+   struct cgroup *cgrp;
+   struct file *f;
+
+   f = fget_raw(fd);
+   if (!f)
+   return ERR_PTR(-EBADF);
+
+   css = css_tryget_online_from_dir(f->f_path.dentry, NULL);
+   fput(f);
+   if (IS_ERR(css))
+   return ERR_CAST(css);
+
+   cgrp = css->cgroup;
+   if (!cgroup_on_dfl(cgrp)) {
+   cgroup_put(cgrp);
+   return ERR_PTR(-EBADF);
+   }
+
+   return cgrp;
+}
+EXPORT_SYMBOL_GPL(cgroup_get_from_fd);
+
 /*
  * sock->sk_cgrp_data handling.  For more info, see sock_cgroup_data
  * definition in cgroup-defs.h.
-- 
2.5.1



[PATCH net-next v3 0/4] cgroup: bpf: cgroup2 membership test on skb

2016-06-30 Thread Martin KaFai Lau
v3:
- Remove WARN_ON_ONCE(!rcu_read_lock_held())
- Stop BPF_MAP_TYPE_CGROUP_ARRAY usage in patch 2/4
- Avoid mounting bpf fs manually in patch 4/4

- Thanks for Daniel's review and the above suggestions

- Check CONFIG_SOCK_CGROUP_DATA instead of CONFIG_CGROUPS.  Thanks to
  the kbuild bot's report.
  Patch 2/4 only needs CONFIG_CGROUPS while patch 3/4 needs
  CONFIG_SOCK_CGROUP_DATA.  Since a single bpf cgrp2 array alone is
  not useful for now, CONFIG_SOCK_CGROUP_DATA is also used in
  patch 2/4.  We can fine tune it later if we find other use cases
  for the cgrp2 array.
- Return EAGAIN instead of ENOENT if the cgrp2 array entry is
  NULL.  It is to distinguish these two cases: 1) the userland has
  not populated this array entry yet. or 2) not finding cgrp2 from the skb.

- Be-lated thanks to Alexei and Tejun on reviewing v1 and giving advice on
  this work.

v2:
- Fix two return cases in cgroup_get_from_fd()
- Fix compilation errors when CONFIG_CGROUPS is not used:
  - arraymap.c: avoid registering BPF_MAP_TYPE_CGROUP_ARRAY
  - filter.c: tc_cls_act_func_proto() returns NULL on BPF_FUNC_skb_in_cgroup
- Add comments to BPF_FUNC_skb_in_cgroup and cgroup_get_from_fd()

v1 cover letter:
This series is to implement a bpf-way to
check the cgroup2 membership of a skb (sk_buff).

It is similar to the feature added in netfilter:
c38c4597e4bf ("netfilter: implement xt_cgroup cgroup2 path match")

The current target is the tc-like usage.



Re: [PATCH v3] wlcore: Add support for get_expected_throughput opcode

2016-06-30 Thread kbuild test robot
Hi,

[auto build test ERROR on wireless-drivers-next/master]
[also build test ERROR on v4.7-rc5 next-20160630]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Maxim-Altshul/wlcore-Add-support-for-get_expected_throughput-opcode/20160630-234034
base:   
https://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next.git 
master
config: alpha-allmodconfig (attached as .config)
compiler: alpha-linux-gnu-gcc (Debian 5.3.1-8) 5.3.1 20160205
reproduce:
wget 
https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
 -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=alpha 

All error/warnings (new ones prefixed by >>):

   drivers/net/wireless/ti/wlcore/main.c: In function 
'wlcore_op_get_expected_throughput':
>> drivers/net/wireless/ti/wlcore/main.c:5678:28: error: 'struct 
>> wl1271_station' has no member named 'wl'
 struct wl1271 *wl = wl_sta->wl;
   ^
>> drivers/net/wireless/ti/wlcore/main.c:5682:26: error: 'struct wl1271_link' 
>> has no member named 'fw_rate_mbps'
  return (wl->links[hlid].fw_rate_mbps * 1000);
 ^
>> drivers/net/wireless/ti/wlcore/main.c:5683:1: warning: control reaches end 
>> of non-void function [-Wreturn-type]
}
^

vim +5678 drivers/net/wireless/ti/wlcore/main.c

  5672  mutex_unlock(>mutex);
  5673  }
  5674  
  5675  static u32 wlcore_op_get_expected_throughput(struct ieee80211_sta *sta)
  5676  {
  5677  struct wl1271_station *wl_sta = (struct wl1271_station 
*)sta->drv_priv;
> 5678  struct wl1271 *wl = wl_sta->wl;
  5679  u8 hlid = wl_sta->hlid;
  5680  
  5681  /* return in units of Kbps */
> 5682  return (wl->links[hlid].fw_rate_mbps * 1000);
> 5683  }
  5684  
  5685  static bool wl1271_tx_frames_pending(struct ieee80211_hw *hw)
  5686  {

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: Binary data


Re: [Patch net] mlx4: set csum_complete_sw bit when fixing complete csum

2016-06-30 Thread Cong Wang
On Wed, Jun 29, 2016 at 7:23 AM, Tariq Toukan  wrote:
> Hi Cong,
>
>> See below. Does commit f8c6455bb04b944edb69e rely on any firmware
>> change to get an expected checksum?
>>
>> $ lspci -nn | grep -i mellanox
>> 82:00.0 Ethernet controller [0200]: Mellanox Technologies MT27500
>> Family [ConnectX-3] [15b3:1003]
>>
>> $ ethtool -i eth0
>> driver: mlx4_en
>> version: 2.2-1 (Feb 2014)
>> firmware-version: 2.33.5220
>> bus-info: :82:00.0
>
> I used same HW and FW, but was not able to reproduce.
> I have kernel HEAD of net-next: 8a79813c1401 net: ethernet: dwc_eth_qos: use
> phy_ethtool_{get|set}_link_ksettings
>
> I ran regular ping, and ping6. Checksum complete counter increases, but no
> warnings in dmesg.

Thanks for testing it! This helps me a lot to identify the real bug.
Actually the bug is in mirred action instead of mlx4 driver, I have
a patch and just confirmed it fixes the bug. I will send it out in a
few minutes.

Again, thanks a lot for your help!


Re: [net-next PATCH v2 1/2] net: pktgen: support injecting packets for qdisc testing

2016-06-30 Thread John Fastabend
On 16-06-30 01:37 AM, Jesper Dangaard Brouer wrote:
> On Wed, 29 Jun 2016 13:03:06 -0700
> John Fastabend  wrote:
> 
>> Add another xmit_mode to pktgen to allow testing xmit functionality
>> of qdiscs. The new mode "queue_xmit" injects packets at
>> __dev_queue_xmit() so that qdisc is called.
>>
>> Signed-off-by: John Fastabend 
> 
> I generally like this.
> 

[...]

>> @@ -3434,6 +3442,36 @@ static void pktgen_xmit(struct pktgen_dev *pkt_dev)
>>  #endif
>>  } while (--burst > 0);
>>  goto out; /* Skips xmit_mode M_START_XMIT */
>> +} else if (pkt_dev->xmit_mode == M_QUEUE_XMIT) {
>> +local_bh_disable();
>> +atomic_add(burst, _dev->skb->users);
> 
> Reading the code, people might think that "burst" is allowed for this
> mode, which it is not. (You do handle this earlier in this patch when
> configuring this mode).

Right we never get here without burst == 1 but sure it does read
a bit strange I'll use atomic_inc().


Thanks,
John



Re: [net-next PATCH 1/2] net: pktgen: support injecting packets for qdisc testing

2016-06-30 Thread John Fastabend
On 16-06-30 03:21 AM, Jamal Hadi Salim wrote:
> On 16-06-29 03:47 PM, John Fastabend wrote:
>> Add another xmit_mode to pktgen to allow testing xmit functionality
>> of qdiscs. The new mode "queue_xmit" injects packets at
>> __dev_queue_xmit() so that qdisc is called.
>>
>> Signed-off-by: John Fastabend 
>> ---

[...]

> 
> Acked-by: Jamal Hadi Salim 
> 
> In travel mode, dont have much cycles right now - but can you review
> again:
> http://www.spinics.net/lists/netdev/msg359545.html
> I think you should disallow clone for example and i wasnt sure if you
> covered all error scenarios etc.
> 

Taking a look at the link couple differences exist. First the patch
linked does a 'netif_xmit_frozen_or_drv_stopped(txq)' check but this
really shouldn't be needed it is handled by the sch_direct_xmit()
logic in ./net/sched

Also in this patch I made it way more conservative on when to back
off then my original patch and now its closer to the one linked except
I also back off with return code NET_XMIT_CN.

As for clones what is the concern exactly? We allow them through the
ingress pktgen mode that can hit classifiers and I don't see any issues
testing with them.

.John

> cheers,
> jamal



Re: [PATCH net-next 19/19] rxrpc: Use RCU to access a peer's service connection tree

2016-06-30 Thread David Howells
David Howells  wrote:

> > You want rb_link_node_rcu() here.
> 
> Should there be an rb_replace_node_rcu() also?

Or I could make rb_replace_node() RCU friendly.  What do you think of the
attached changes (split into appropriate patches)?  It's a case of changing
the order in which pointers are set in the rbtree code and inserting a
barrier.

I also wonder if rb_insert_color() needs some attention - though possibly
that's okay as it doesn't start with unset pointers (since you call
rb_link_node_rcu() first).

David
---
diff --git a/include/linux/rbtree_augmented.h b/include/linux/rbtree_augmented.h
index 14d7b831b63a..d076183e49be 100644
--- a/include/linux/rbtree_augmented.h
+++ b/include/linux/rbtree_augmented.h
@@ -130,6 +130,19 @@ __rb_change_child(struct rb_node *old, struct rb_node *new,
WRITE_ONCE(root->rb_node, new);
 }
 
+static inline void
+__rb_change_child_rcu(struct rb_node *old, struct rb_node *new,
+ struct rb_node *parent, struct rb_root *root)
+{
+   if (parent) {
+   if (parent->rb_left == old)
+   rcu_assign_pointer(parent->rb_left, new);
+   else
+   rcu_assign_pointer(parent->rb_right, new);
+   } else
+   rcu_assign_pointer(root->rb_node, new);
+}
+
 extern void __rb_erase_color(struct rb_node *parent, struct rb_root *root,
void (*augment_rotate)(struct rb_node *old, struct rb_node *new));
 
diff --git a/lib/rbtree.c b/lib/rbtree.c
index 1356454e36de..2b1a190c737c 100644
--- a/lib/rbtree.c
+++ b/lib/rbtree.c
@@ -539,15 +539,17 @@ void rb_replace_node(struct rb_node *victim, struct 
rb_node *new,
 {
struct rb_node *parent = rb_parent(victim);
 
+   /* Copy the pointers/colour from the victim to the replacement */
+   *new = *victim;
+
/* Set the surrounding nodes to point to the replacement */
-   __rb_change_child(victim, new, parent, root);
if (victim->rb_left)
rb_set_parent(victim->rb_left, new);
if (victim->rb_right)
rb_set_parent(victim->rb_right, new);
 
-   /* Copy the pointers/colour from the victim to the replacement */
-   *new = *victim;
+   /* Set the onward pointer last with an RCU barrier */
+   __rb_change_child_rcu(victim, new, parent, root);
 }
 EXPORT_SYMBOL(rb_replace_node);
 
diff --git a/net/rxrpc/conn_service.c b/net/rxrpc/conn_service.c
index dc64211c5ee8..298ec300cfcc 100644
--- a/net/rxrpc/conn_service.c
+++ b/net/rxrpc/conn_service.c
@@ -41,14 +41,14 @@ struct rxrpc_connection *rxrpc_find_service_conn_rcu(struct 
rxrpc_peer *peer,
 */
read_seqbegin_or_lock(>service_conn_lock, );
 
-   p = peer->service_conns.rb_node;
+   p = rcu_dereference(peer->service_conns.rb_node);
while (p) {
conn = rb_entry(p, struct rxrpc_connection, 
service_node);
 
if (conn->proto.index_key < k.index_key)
-   p = p->rb_left;
+   p = rcu_dereference(p->rb_left);
else if (conn->proto.index_key > k.index_key)
-   p = p->rb_right;
+   p = rcu_dereference(p->rb_right);
else
goto done;
conn = NULL;
@@ -90,7 +90,7 @@ rxrpc_publish_service_conn(struct rxrpc_peer *peer,
goto found_extant_conn;
}
 
-   rb_link_node(>service_node, parent, pp);
+   rb_link_node_rcu(>service_node, parent, pp);
rb_insert_color(>service_node, >service_conns);
 conn_published:
set_bit(RXRPC_CONN_IN_SERVICE_CONNS, >flags);


Re: [net-next PATCH v2 2/2] net: samples: pktgen mode samples/tests for qdisc layer

2016-06-30 Thread John Fastabend
On 16-06-30 01:23 AM, Jesper Dangaard Brouer wrote:
> On Wed, 29 Jun 2016 13:03:26 -0700
> John Fastabend  wrote:
> 
>> This adds samples for pktgen to use with new mode to inject pkts into
>> the qdisc layer. This also doubles as nice test cases to test any
>> patches against qdisc layer.

[...]

>> +#
>> +# Benchmark script:
>> +#  - developed for benchmarking egress qdisc path, derived from
>> +#ingress benchmark script.
>> +#

As you probably gathered 'derived' is giving me too much credit here
its more like cut'n'pasted from ingress benchmark scrip :)

>> +# Script for injecting packets into egress qdisc path of the stack
>> +# with pktgen "xmit_mode queue_xmit".
>> +#
>> +basedir=`dirname $0`
>> +source ${basedir}/functions.sh
>> +root_check_run_with_sudo "$@"
>> +
>> +# Parameter parsing via include
>> +source ${basedir}/parameters.sh
>> +# Using invalid DST_MAC will cause the packets to get dropped in
>> +# ip_rcv() which is part of the test
>> +[ -z "$DEST_IP" ] && DEST_IP="198.18.0.42"
>> +[ -z "$DST_MAC" ] && DST_MAC="90:e2:ba:ff:ff:ff"
>> +
>> +# Burst greater than 1 are invalid but allow users to specify it and
>> +# get an error instead of silently ignoring it.
>> +[ -z "$BURST" ] && BURST=1
> 
> In other scripts I've rejected this at this step, instead of depending
> on failure when sending the burst option to pktgen. Like:
> 
> https://github.com/netoptimizer/network-testing/blob/master/pktgen/pktgen_sample04_many_flows.sh#L31-L33
> 

Agreed that is nicer. I had originally left it to make sure I was
catching the burst > 1 case in pktgen but will remove.

>> +
>> +# Base Config
>> +DELAY="0"# Zero means max speed
>> +COUNT="1000" # Zero means indefinitely
>> +
>> +# General cleanup everything since last run
>> +pg_ctrl "reset"
>> +
>> +# Threads are specified with parameter -t value in $THREADS
>> +for ((thread = 0; thread < $THREADS; thread++)); do
>> +# The device name is extended with @name, using thread number to
>> +# make then unique, but any name will do.
>> +dev=${DEV}@${thread}
>> +
>> +# Add remove all other devices and add_device $dev to thread
>> +pg_thread $thread "rem_device_all"
>> +pg_thread $thread "add_device" $dev
>> +
>> +# Base config of dev
>> +pg_set $dev "flag QUEUE_MAP_CPU"
>> +pg_set $dev "count $COUNT"
>> +pg_set $dev "pkt_size $PKT_SIZE"
>> +pg_set $dev "delay $DELAY"
>> +pg_set $dev "flag NO_TIMESTAMP"
>> +
>> +# Destination
>> +pg_set $dev "dst_mac $DST_MAC"
>> +pg_set $dev "dst $DEST_IP"
>> +
>> +# Inject packet into RX path of stack
> 
> Hmmm, maybe above comment need to be adjusted...

Yep.

> 
>> +pg_set $dev "xmit_mode queue_xmit"
>> +
>> +# Burst allow us to avoid measuring SKB alloc/free overhead
> 
> This comment is confusing, maybe just remove. Didn't think burst is a
> valid use-case.

Yep.




Re: [PATCH net-next V4 5/6] net: introduce NETDEV_CHANGE_TX_QUEUE_LEN

2016-06-30 Thread John Fastabend
On 16-06-29 11:45 PM, Jason Wang wrote:
> This patch introduces a new event - NETDEV_CHANGE_TX_QUEUE_LEN, this
> will be triggered when tx_queue_len. It could be used by net device
> who want to do some processing at that time. An example is tun who may
> want to resize tx array when tx_queue_len is changed.
> 
> Cc: John Fastabend 
> Signed-off-by: Jason Wang 
> ---


Thanks for adding the setlink case.

Acked-by: John Fastabend 



Re: [PATCH net-next 08/16] net/devlink: Add E-Switch mode control

2016-06-30 Thread John Fastabend
On 16-06-30 08:53 AM, Jiri Pirko wrote:
> Thu, Jun 30, 2016 at 05:40:57PM CEST, john.fastab...@gmail.com wrote:
>> On 16-06-30 03:52 AM, Jiri Pirko wrote:
>>> Thu, Jun 30, 2016 at 09:57:21AM CEST, john.fastab...@gmail.com wrote:
 On 16-06-30 12:41 AM, Jiri Pirko wrote:
> Thu, Jun 30, 2016 at 09:13:55AM CEST, sridhar.samudr...@intel.com wrote:
>>
>>
>> On 6/29/2016 11:25 PM, Jiri Pirko wrote:
>>> Thu, Jun 30, 2016 at 06:04:39AM CEST, john.fastab...@gmail.com wrote:
 On 16-06-29 08:35 PM, John Fastabend wrote:
> On 16-06-29 03:09 PM, John Fastabend wrote:
>> On 16-06-29 02:33 PM, Or Gerlitz wrote:
>>> On Wed, Jun 29, 2016 at 7:35 PM, John Fastabend
>>>  wrote:
 On 16-06-29 07:48 AM, Or Gerlitz wrote:
> On 6/28/2016 10:31 PM, John Fastabend wrote:
>> On 16-06-28 12:12 PM, Jiri Pirko wrote:
>>> Why?! Please, leave legacy be legacy. Use the new mode for
>>> implementing new features. Don't make things any more 
>>> complicated :(
>>> [...]
>> Maybe I'm reading to much into the devlink flag names and if 
>> instead
>> you use a switch like the following,
>>VF representer : enable/disable the creation VF netdev's to 
>> represent
>> the virtual functions on the PF
>> Much less complicated then magic switching between forwarding 
>> logic IMO
>> and you don't whack a default configuration that an entire stack 
>> (e.g.
>> libvirt) has been built to use.
> Re letting the user to observe/modify the rules added by the
> driver/firmware while legacy mode. Even if possible with 
> bridge/fdb, it
> will be really pragmatical and doesn't make sense to get that 
> donefor
> the TC subsystem. So this isn't a well defined solution and 
> anyway, as
> you said, legacy mode enhancements is a different exercise. 
> Personally,
> I agree with Jiri, that we should legacy be legacyand focus on 
> adding
> the new model.
 The ixgbe driver already supports bridge and tc commands without 
 the VF
 representer.  Adding the VF representer to these drivers just 
 extends
 the existing support so we have an identifier for VFs and now the
 redirect action works and the fdb commands can specify the VF 
 netdevs.
 I don't see this as a problem because we already do it today with
 'ip' and bridge tools.
>>> To be precise, for both ixgbe and mlx5, the existing tc support
>>> (u32/ixgbe, flower/mlx5) is not for switching functionality but 
>>> rather
>>> for NIC-ish one, e.g drop, mark, etc. Indeed in ixgbe you added
>>> redirect to VF, but this is only for south --> north (wire --> VF)
>>> traffic, w.o the VF rep you can't do the other way around.
>>>
>> Correct which is why we need the VF rep. So we are completely in
>> sync there.
>>
>>> Just to clarify, to what exact bridge command support did you refer 
>>> for ixgbe?
>> 'bridge fdb' commands are supported today on the PF. But its the
>> same story as above we need the VF rep to also use it on the
>> VF representer
>>
>> Also 'bridge link' command for veb/vepa modes is supported and the
>> other link attributes could be supported with additional driver
>> support. No need for core changes here. But again yes only on the
>> PF so again we need the VF reps.
>>
>>> The forwarding done in the legacy mode is not well defined, and
>>> different across vendors, adding there the VF reps will not make it
>>> any better b/c some steering rules will be set by tc/bridge offloads
>>> while other rules will be put by the driver.
>>> I don't see how this takes us to better place.
>> In legacy mode or any other mode you are defining some default policy
>> and rules.
>>
>> In the legacy mode we use mac/vlan assigned l2 forwarding entries in 
>> the
>> hardware fdb which are seen when you query 'ip link' and 'bridge fdb'
>> today. And similarly can be modified today using 'ip link' and 
>> 'bridge
>> fdb' at least on the intel devices. Its not undefined in any way with
>> a quick query of the tools we can learn exactly what the 
>> configuration
>> is and even change it. This works fairly well with existing 
>> controllers
>> and stacks.
>>
>> The limitations are 'ip' only supports a single MAC address per 

Re: It's back! (Re: [REGRESSION] NFS is creating a hidden port (left over from xs_bind() ))

2016-06-30 Thread Steven Rostedt
On Thu, 30 Jun 2016 11:23:41 -0400
Steven Rostedt  wrote:


> I can add more trace_printk()s if it would help.

I added a trace_printk() in inet_bind_bucket_destroy() to print out
some information on the socket used by xs_bind(), and it shows that the
bind destroy is called, but the list is not empty.



/*
 * Caller must hold hashbucket lock for this tb with local BH disabled
 */
void inet_bind_bucket_destroy(struct kmem_cache *cachep, struct 
inet_bind_bucket *tb)
{
if (!current->mm && xs_port == tb->port) {
trace_printk("destroy %d empty=%d %p\n",
 tb->port, hlist_empty(>owners), tb);
trace_dump_stack(1);
}
if (hlist_empty(>owners)) {
__hlist_del(>node);
kmem_cache_free(cachep, tb);
}
}

I created "xs_port" to hold the port of the variable used by xs_bind,
and when it is called, the hlist_empty(>owners) returns false.

I'll add more trace_printks to find out where those owners are being
added.

-- Steve


Re: [PATCH net-next 19/19] rxrpc: Use RCU to access a peer's service connection tree

2016-06-30 Thread David Howells
Peter Zijlstra  wrote:

> > +   if (conn->proto.index_key < k.index_key)
> > +   p = p->rb_left;
> > +   else if (conn->proto.index_key > k.index_key)
> > +   p = p->rb_right;
> 
> You still very much need rcu_dereference() for both left and right
> pointers. As well as the first p load.

Bah...  Yes.  Good point.

> > +   rb_link_node(>service_node, parent, pp);
> 
> You want rb_link_node_rcu() here.

Should there be an rb_replace_node_rcu() also?

David


RE: [PATCH v14 net-next 1/1] hv_sock: introduce Hyper-V Sockets

2016-06-30 Thread Dexuan Cui
> From: Olaf Hering [mailto:o...@aepfle.de]
> Sent: Friday, July 1, 2016 0:12
> To: Dexuan Cui 
> Cc: da...@davemloft.net; gre...@linuxfoundation.org;
> netdev@vger.kernel.org; linux-ker...@vger.kernel.org;
> de...@linuxdriverproject.org; a...@canonical.com; jasow...@redhat.com;
> Vitaly Kuznetsov ; Cathy Avery ;
> KY Srinivasan ; Haiyang Zhang
> ; j...@perches.com; Rolf Neugebauer
> 
> Subject: Re: [PATCH v14 net-next 1/1] hv_sock: introduce Hyper-V Sockets
> 
> On Thu, Jun 30, Dexuan Cui wrote:
> 
> > -#define AF_MAX 43  /* For now.. */
> > +#define AF_MAX 44  /* For now.. */
> 
> Should this patch also change the places where AF_MAX is used,
> like all the arrays in net/core/sock.c?
> 
> Olaf

Thanks for the reminder, Olaf!

I think we may as well make a separate patch for this. 
It is in my To-Do list.

Thanks,
-- Dexuan


Re: [PATCH v14 net-next 1/1] hv_sock: introduce Hyper-V Sockets

2016-06-30 Thread Olaf Hering
On Thu, Jun 30, Dexuan Cui wrote:

> -#define AF_MAX   43  /* For now.. */
> +#define AF_MAX   44  /* For now.. */

Should this patch also change the places where AF_MAX is used,
like all the arrays in net/core/sock.c?

Olaf


signature.asc
Description: PGP signature


[PATCH v14 net-next 0/1] introduce Hyper-V VM Sockets(hv_sock)

2016-06-30 Thread Dexuan Cui
Hyper-V Sockets (hv_sock) supplies a byte-stream based communication
mechanism between the host and the guest. It's somewhat like TCP over
VMBus, but the transportation layer (VMBus) is much simpler than IP.

With Hyper-V Sockets, applications between the host and the guest can talk
to each other directly by the traditional BSD-style socket APIs.

Hyper-V Sockets is only available on new Windows hosts, like Windows Server
2016. More info is in this article "Make your own integration services":
https://msdn.microsoft.com/en-us/virtualization/hyperv_on_windows/develop/make_mgmt_service

The patch implements the necessary support in the guest side by
introducing a new socket address family AF_HYPERV.

You can also get the patch by:
https://github.com/dcui/linux/commits/decui/hv_sock/net-next/20160629_v14

Note: the VMBus driver side's supporting patches have been in the mainline
tree.

I know the kernel has already had a VM Sockets driver (AF_VSOCK) based
on VMware VMCI (net/vmw_vsock/, drivers/misc/vmw_vmci), and KVM is
proposing AF_VSOCK of virtio version:
http://marc.info/?l=linux-netdev=145952064004765=2

However, though Hyper-V Sockets may seem conceptually similar to
AF_VOSCK, there are differences in the transportation layer, and IMO these
make the direct code reusing impractical:

1. In AF_VSOCK, the endpoint type is: , but in
AF_HYPERV, the endpoint type is: . Here GUID
is 128-bit.

2. AF_VSOCK supports SOCK_DGRAM, while AF_HYPERV doesn't.

3. AF_VSOCK supports some special sock opts, like SO_VM_SOCKETS_BUFFER_SIZE,
SO_VM_SOCKETS_BUFFER_MIN/MAX_SIZE and SO_VM_SOCKETS_CONNECT_TIMEOUT.
These are meaningless to AF_HYPERV.

4. Some AF_VSOCK's VMCI transportation ops are meanless to AF_HYPERV/VMBus,
like .notify_recv_init
.notify_recv_pre_block
.notify_recv_pre_dequeue
.notify_recv_post_dequeue
.notify_send_init
.notify_send_pre_block
.notify_send_pre_enqueue
.notify_send_post_enqueue
etc.

So I think we'd better introduce a new address family: AF_HYPERV.

Please review the patch.

Looking forward to your comments, especially comments from David. :-)

Changes since v1:
- updated "[PATCH 6/7] hvsock: introduce Hyper-V VM Sockets feature"
- added __init and __exit for the module init/exit functions
- net/hv_sock/Kconfig: "default m" -> "default m if HYPERV"
- MODULE_LICENSE: "Dual MIT/GPL" -> "Dual BSD/GPL"

Changes since v2:
- fixed various coding issue pointed out by David Miller
- fixed indentation issues
- removed pr_debug in net/hv_sock/af_hvsock.c
- used reverse-Chrismas-tree style for local variables.
- EXPORT_SYMBOL -> EXPORT_SYMBOL_GPL

Changes since v3:
- fixed a few coding issue pointed by Vitaly Kuznetsov and Dan Carpenter
- fixed the ret value in vmbus_recvpacket_hvsock on error
- fixed the style of multi-line comment: vmbus_get_hvsock_rw_status()

Changes since v4 (https://lkml.org/lkml/2015/7/28/404):
- addressed all the comments about V4.
- treat the hvsock offers/channels as special VMBus devices
- add a mechanism to pass hvsock events to the hvsock driver
- fixed some corner cases with proper locking when a connection is closed
- rebased to the latest Greg's tree

Changes since v5 (https://lkml.org/lkml/2015/12/24/103):
- addressed the coding style issues (Vitaly Kuznetsov & David Miller, thanks!)
- used a better coding for the per-channel rescind callback (Thank Vitaly!)
- avoided the introduction of new VMBUS driver APIs vmbus_sendpacket_hvsock()
and vmbus_recvpacket_hvsock() and used vmbus_sendpacket()/vmbus_recvpacket()
in the higher level (i.e., the vmsock driver). Thank Vitaly!

Changes since v6 (http://lkml.iu.edu/hypermail/linux/kernel/1601.3/01813.html)
- only a few minor changes of coding style and comments

Changes since v7
- a few minor changes of coding style: thanks, Joe Perches!
- added some lines of comments about GUID/UUID before the struct sockaddr_hv.

Changes since v8
- removed the unnecessary __packed for some definitions: thanks, David!
- hvsock_open_connection:  use offer.u.pipe.user_def[0] to know the connection
and reorganized the function
direction 
- reorganized the code according to suggestions from Cathy Avery: split big
functions into small ones, set .setsockopt and getsockopt to
sock_no_setsockopt/sock_no_getsockopt
- inline'd some small list helper functions

Changes since v9
- minimized struct hvsock_sock by making the send/recv buffers pointers.
   the buffers are allocated by kmalloc() in __hvsock_create() now.
- minimized the sizes of the send/recv buffers and the vmbus ringbuffers.

Changes since v10

1) add module params: send_ring_page, recv_ring_page. They can be used to
enlarge the ringbuffer size to get better performance, e.g.,
# modprobe hv_sock  recv_ring_page=16 send_ring_page=16
By default, recv_ring_page is 3 and send_ring_page is 2.

2) add module param max_socket_number (the default is 1024).
A user can enlarge the number to create more than 1024 hv_sock sockets.
By default, 1024 sockets take about 1024 * (3+2+1+1) * 4KB = 28M bytes.
(Here 

[PATCH v14 net-next 1/1] hv_sock: introduce Hyper-V Sockets

2016-06-30 Thread Dexuan Cui
Hyper-V Sockets (hv_sock) supplies a byte-stream based communication
mechanism between the host and the guest. It's somewhat like TCP over
VMBus, but the transportation layer (VMBus) is much simpler than IP.

With Hyper-V Sockets, applications between the host and the guest can talk
to each other directly by the traditional BSD-style socket APIs.

Hyper-V Sockets is only available on new Windows hosts, like Windows Server
2016. More info is in this article "Make your own integration services":
https://msdn.microsoft.com/en-us/virtualization/hyperv_on_windows/develop/make_mgmt_service

The patch implements the necessary support in the guest side by introducing
a new socket address family AF_HYPERV.

Signed-off-by: Dexuan Cui 
Cc: "K. Y. Srinivasan" 
Cc: Haiyang Zhang 
Cc: Vitaly Kuznetsov 
Cc: Cathy Avery 
---

You can also get the patch here (8ba95c8ec9):
https://github.com/dcui/linux/commits/decui/hv_sock/net-next/20160629_v14

For the change log before v12, please see https://lkml.org/lkml/2016/5/15/31

In v12, the changes are mainly the following:

1) remove the module params as David suggested.

2) use 5 exact pages for VMBus send/recv rings, respectively.
The host side's design of the feature requires 5 exact pages for recv/send
rings respectively -- this is suboptimal considering memory consumption,
however unluckily we have to live with it, before the host comes up with
a new design in the future. :-(

3) remove the per-connection static send/recv buffers
Instead, we allocate and free the buffers dynamically only when we recv/send
data. This means: when a connection is idle, no memory is consumed as
recv/send buffers at all.

In v13:
I return ENOMEM on buffer alllocation failure

   Actually "man read/write" says "Other errors may occur, depending on the
object connected to fd". "man send/recv" indeed lists ENOMEM.
   Considering AF_HYPERV is a new socket type, ENOMEM seems OK here.
   In the long run, I think we should add a new API in the VMBus driver,
allowing data copy from VMBus ringbuffer into user mode buffer directly.
This way, we can even eliminate this temporary buffer.

In v14:
fix some coding style issues pointed out by David.

Looking forward to your comments!
 MAINTAINERS |2 +
 include/linux/hyperv.h  |   13 +
 include/linux/socket.h  |4 +-
 include/net/af_hvsock.h |   59 ++
 include/uapi/linux/hyperv.h |   24 +
 net/Kconfig |1 +
 net/Makefile|1 +
 net/hv_sock/Kconfig |   10 +
 net/hv_sock/Makefile|3 +
 net/hv_sock/af_hvsock.c | 1519 +++
 10 files changed, 1635 insertions(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 50f69ba..6eaa26f 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -5514,7 +5514,9 @@ F:drivers/pci/host/pci-hyperv.c
 F: drivers/net/hyperv/
 F: drivers/scsi/storvsc_drv.c
 F: drivers/video/fbdev/hyperv_fb.c
+F: net/hv_sock/
 F: include/linux/hyperv.h
+F: include/net/af_hvsock.h
 F: tools/hv/
 F: Documentation/ABI/stable/sysfs-bus-vmbus
 
diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index 50f493e..1cda6ea5 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -1508,5 +1508,18 @@ static inline void commit_rd_index(struct vmbus_channel 
*channel)
vmbus_set_event(channel);
 }
 
+struct vmpipe_proto_header {
+   u32 pkt_type;
+   u32 data_size;
+};
+
+#define HVSOCK_HEADER_LEN  (sizeof(struct vmpacket_descriptor) + \
+sizeof(struct vmpipe_proto_header))
+
+/* See 'prev_indices' in hv_ringbuffer_read(), hv_ringbuffer_write() */
+#define PREV_INDICES_LEN   (sizeof(u64))
 
+#define HVSOCK_PKT_LEN(payload_len)(HVSOCK_HEADER_LEN + \
+   ALIGN((payload_len), 8) + \
+   PREV_INDICES_LEN)
 #endif /* _HYPERV_H */
diff --git a/include/linux/socket.h b/include/linux/socket.h
index b5cc5a6..0b68b58 100644
--- a/include/linux/socket.h
+++ b/include/linux/socket.h
@@ -202,8 +202,9 @@ struct ucred {
 #define AF_VSOCK   40  /* vSockets */
 #define AF_KCM 41  /* Kernel Connection Multiplexor*/
 #define AF_QIPCRTR 42  /* Qualcomm IPC Router  */
+#define AF_HYPERV  43  /* Hyper-V Sockets  */
 
-#define AF_MAX 43  /* For now.. */
+#define AF_MAX 44  /* For now.. */
 
 /* Protocol families, same as address families. */
 #define PF_UNSPEC  AF_UNSPEC
@@ -251,6 +252,7 @@ struct ucred {
 #define PF_VSOCK   AF_VSOCK
 #define PF_KCM AF_KCM
 #define PF_QIPCRTR AF_QIPCRTR
+#define PF_HYPERV  AF_HYPERV
 #define PF_MAX AF_MAX
 
 /* Maximum queue length specifiable by listen.  */
diff --git a/include/net/af_hvsock.h 

Re: [PATCH net-next 08/16] net/devlink: Add E-Switch mode control

2016-06-30 Thread Jiri Pirko
Thu, Jun 30, 2016 at 05:40:57PM CEST, john.fastab...@gmail.com wrote:
>On 16-06-30 03:52 AM, Jiri Pirko wrote:
>> Thu, Jun 30, 2016 at 09:57:21AM CEST, john.fastab...@gmail.com wrote:
>>> On 16-06-30 12:41 AM, Jiri Pirko wrote:
 Thu, Jun 30, 2016 at 09:13:55AM CEST, sridhar.samudr...@intel.com wrote:
>
>
> On 6/29/2016 11:25 PM, Jiri Pirko wrote:
>> Thu, Jun 30, 2016 at 06:04:39AM CEST, john.fastab...@gmail.com wrote:
>>> On 16-06-29 08:35 PM, John Fastabend wrote:
 On 16-06-29 03:09 PM, John Fastabend wrote:
> On 16-06-29 02:33 PM, Or Gerlitz wrote:
>> On Wed, Jun 29, 2016 at 7:35 PM, John Fastabend
>>  wrote:
>>> On 16-06-29 07:48 AM, Or Gerlitz wrote:
 On 6/28/2016 10:31 PM, John Fastabend wrote:
> On 16-06-28 12:12 PM, Jiri Pirko wrote:
>> Why?! Please, leave legacy be legacy. Use the new mode for
>> implementing new features. Don't make things any more 
>> complicated :(
>> [...]
> Maybe I'm reading to much into the devlink flag names and if 
> instead
> you use a switch like the following,
>VF representer : enable/disable the creation VF netdev's to 
> represent
> the virtual functions on the PF
> Much less complicated then magic switching between forwarding 
> logic IMO
> and you don't whack a default configuration that an entire stack 
> (e.g.
> libvirt) has been built to use.
 Re letting the user to observe/modify the rules added by the
 driver/firmware while legacy mode. Even if possible with 
 bridge/fdb, it
 will be really pragmatical and doesn't make sense to get that 
 donefor
 the TC subsystem. So this isn't a well defined solution and 
 anyway, as
 you said, legacy mode enhancements is a different exercise. 
 Personally,
 I agree with Jiri, that we should legacy be legacyand focus on 
 adding
 the new model.
>>> The ixgbe driver already supports bridge and tc commands without 
>>> the VF
>>> representer.  Adding the VF representer to these drivers just 
>>> extends
>>> the existing support so we have an identifier for VFs and now the
>>> redirect action works and the fdb commands can specify the VF 
>>> netdevs.
>>> I don't see this as a problem because we already do it today with
>>> 'ip' and bridge tools.
>> To be precise, for both ixgbe and mlx5, the existing tc support
>> (u32/ixgbe, flower/mlx5) is not for switching functionality but 
>> rather
>> for NIC-ish one, e.g drop, mark, etc. Indeed in ixgbe you added
>> redirect to VF, but this is only for south --> north (wire --> VF)
>> traffic, w.o the VF rep you can't do the other way around.
>>
> Correct which is why we need the VF rep. So we are completely in
> sync there.
>
>> Just to clarify, to what exact bridge command support did you refer 
>> for ixgbe?
> 'bridge fdb' commands are supported today on the PF. But its the
> same story as above we need the VF rep to also use it on the
> VF representer
>
> Also 'bridge link' command for veb/vepa modes is supported and the
> other link attributes could be supported with additional driver
> support. No need for core changes here. But again yes only on the
> PF so again we need the VF reps.
>
>> The forwarding done in the legacy mode is not well defined, and
>> different across vendors, adding there the VF reps will not make it
>> any better b/c some steering rules will be set by tc/bridge offloads
>> while other rules will be put by the driver.
>> I don't see how this takes us to better place.
> In legacy mode or any other mode you are defining some default policy
> and rules.
>
> In the legacy mode we use mac/vlan assigned l2 forwarding entries in 
> the
> hardware fdb which are seen when you query 'ip link' and 'bridge fdb'
> today. And similarly can be modified today using 'ip link' and 'bridge
> fdb' at least on the intel devices. Its not undefined in any way with
> a quick query of the tools we can learn exactly what the configuration
> is and even change it. This works fairly well with existing 
> controllers
> and stacks.
>
> The limitations are 'ip' only supports a single MAC address per VF and
> 'tc' doesn't work on VF ports because when the VF is assigned to a VM
> or namespace we lose visibility of it. Providing a VF rep for 

Re: [PATCH net-next 2/3] cxgb4/cxgb4vf: Add set VF mac address support

2016-06-30 Thread Hariprasad Shenai
On Thu, Jun 30, 2016 at 13:13:15 +, Yuval Mintz wrote:
> > +   /* verify MAC addr is valid */
> > +   if (!is_zero_ether_addr(mac) && !is_valid_ether_addr(mac) &&
> > +   is_multicast_ether_addr(mac)) {
> 
> This is really odd as verification goes; Currently this is a very elaborate
> way of checking for multicast, but I guess it's  probably a mistake.
> 
My bad, will send a V2


Re: [PATCH net-next V4 0/6] switch to use tx skb array in tun

2016-06-30 Thread Michael S. Tsirkin
On Thu, Jun 30, 2016 at 02:45:30PM +0800, Jason Wang wrote:
> Hi all:
> 
> This series tries to switch to use skb array in tun. This is used to
> eliminate the spinlock contention between producer and consumer. The
> conversion was straightforward: just introdce a tx skb array and use
> it instead of sk_receive_queue.
> 
> A minor issue is to keep the tx_queue_len behaviour, since tun used to
> use it for the length of sk_receive_queue. This is done through:
> 
> - add the ability to resize multiple rings at once to avoid handling
>   partial resize failure for mutiple rings.
> - add the support for zero length ring.
> - introduce a notifier which was triggered when tx_queue_len was
>   changed for a netdev.
> - resize all queues during the tx_queue_len changing.
> 
> Tests shows about 15% improvement on guest rx pps:
> 
> Before: ~130pps
> After : ~150pps

Acked-by: Michael S. Tsirkin 

Acked-from-altitude: 34697 feet.


> Changes from V3:
> - fix kbuild warnings
> - call NETDEV_CHANGE_TX_QUEUE_LEN on IFLA_TXQLEN
> 
> Changes from V2:
> - add multiple rings resizing support for ptr_ring/skb_array
> - add zero length ring support
> - introdce a NETDEV_CHANGE_TX_QUEUE_LEN
> - drop new flags
> 
> Changes from V1:
> - switch to use skb array instead of a customized circular buffer
> - add non-blocking support
> - rename .peek to .peek_len
> - drop lockless peeking since test show very minor improvement
> 
> Jason Wang (5):
>   ptr_ring: support zero length ring
>   skb_array: minor tweak
>   skb_array: add wrappers for resizing
>   net: introduce NETDEV_CHANGE_TX_QUEUE_LEN
>   tun: switch to use skb array for tx
> 
> Michael S. Tsirkin (1):
>   ptr_ring: support resizing multiple queues
> 
>  drivers/net/tun.c| 138 
> ---
>  drivers/vhost/net.c  |  16 -
>  include/linux/net.h  |   1 +
>  include/linux/netdevice.h|   1 +
>  include/linux/ptr_ring.h |  77 ++
>  include/linux/skb_array.h|  13 +++-
>  net/core/net-sysfs.c |  15 -
>  net/core/rtnetlink.c |  16 +++--
>  tools/virtio/ringtest/ptr_ring.c |   5 ++
>  9 files changed, 255 insertions(+), 27 deletions(-)
> 
> -- 
> 2.7.4


Re: [PATCH net-next V2 08/16] net/devlink: Add E-Switch mode control

2016-06-30 Thread Jiri Pirko
Thu, Jun 30, 2016 at 05:23:27PM CEST, sae...@mellanox.com wrote:
>From: Or Gerlitz 
>
>Add the commands to set and show the mode of SRIOV E-Switch, two modes
>are supported:
>
>* legacy: operating in the "old" L2 based mode (DMAC --> VF vport)
>
>* switchdev: the E-Switch is referred to as whitebox switch configured
>using standard tools such as tc, bridge, openvswitch etc. To allow
>working with the tools, for each VF, a VF representor netdevice is
>created by the E-Switch manager vendor device driver instance (e.g PF).
>
>Signed-off-by: Or Gerlitz 
>Signed-off-by: Saeed Mahameed 

Acked-by: Jiri Pirko 


Re: [PATCH net-next 08/16] net/devlink: Add E-Switch mode control

2016-06-30 Thread John Fastabend
On 16-06-30 03:52 AM, Jiri Pirko wrote:
> Thu, Jun 30, 2016 at 09:57:21AM CEST, john.fastab...@gmail.com wrote:
>> On 16-06-30 12:41 AM, Jiri Pirko wrote:
>>> Thu, Jun 30, 2016 at 09:13:55AM CEST, sridhar.samudr...@intel.com wrote:


 On 6/29/2016 11:25 PM, Jiri Pirko wrote:
> Thu, Jun 30, 2016 at 06:04:39AM CEST, john.fastab...@gmail.com wrote:
>> On 16-06-29 08:35 PM, John Fastabend wrote:
>>> On 16-06-29 03:09 PM, John Fastabend wrote:
 On 16-06-29 02:33 PM, Or Gerlitz wrote:
> On Wed, Jun 29, 2016 at 7:35 PM, John Fastabend
>  wrote:
>> On 16-06-29 07:48 AM, Or Gerlitz wrote:
>>> On 6/28/2016 10:31 PM, John Fastabend wrote:
 On 16-06-28 12:12 PM, Jiri Pirko wrote:
> Why?! Please, leave legacy be legacy. Use the new mode for
> implementing new features. Don't make things any more complicated 
> :(
> [...]
 Maybe I'm reading to much into the devlink flag names and if 
 instead
 you use a switch like the following,
VF representer : enable/disable the creation VF netdev's to 
 represent
 the virtual functions on the PF
 Much less complicated then magic switching between forwarding 
 logic IMO
 and you don't whack a default configuration that an entire stack 
 (e.g.
 libvirt) has been built to use.
>>> Re letting the user to observe/modify the rules added by the
>>> driver/firmware while legacy mode. Even if possible with 
>>> bridge/fdb, it
>>> will be really pragmatical and doesn't make sense to get that 
>>> donefor
>>> the TC subsystem. So this isn't a well defined solution and anyway, 
>>> as
>>> you said, legacy mode enhancements is a different exercise. 
>>> Personally,
>>> I agree with Jiri, that we should legacy be legacyand focus on 
>>> adding
>>> the new model.
>> The ixgbe driver already supports bridge and tc commands without the 
>> VF
>> representer.  Adding the VF representer to these drivers just extends
>> the existing support so we have an identifier for VFs and now the
>> redirect action works and the fdb commands can specify the VF 
>> netdevs.
>> I don't see this as a problem because we already do it today with
>> 'ip' and bridge tools.
> To be precise, for both ixgbe and mlx5, the existing tc support
> (u32/ixgbe, flower/mlx5) is not for switching functionality but rather
> for NIC-ish one, e.g drop, mark, etc. Indeed in ixgbe you added
> redirect to VF, but this is only for south --> north (wire --> VF)
> traffic, w.o the VF rep you can't do the other way around.
>
 Correct which is why we need the VF rep. So we are completely in
 sync there.

> Just to clarify, to what exact bridge command support did you refer 
> for ixgbe?
 'bridge fdb' commands are supported today on the PF. But its the
 same story as above we need the VF rep to also use it on the
 VF representer

 Also 'bridge link' command for veb/vepa modes is supported and the
 other link attributes could be supported with additional driver
 support. No need for core changes here. But again yes only on the
 PF so again we need the VF reps.

> The forwarding done in the legacy mode is not well defined, and
> different across vendors, adding there the VF reps will not make it
> any better b/c some steering rules will be set by tc/bridge offloads
> while other rules will be put by the driver.
> I don't see how this takes us to better place.
 In legacy mode or any other mode you are defining some default policy
 and rules.

 In the legacy mode we use mac/vlan assigned l2 forwarding entries in 
 the
 hardware fdb which are seen when you query 'ip link' and 'bridge fdb'
 today. And similarly can be modified today using 'ip link' and 'bridge
 fdb' at least on the intel devices. Its not undefined in any way with
 a quick query of the tools we can learn exactly what the configuration
 is and even change it. This works fairly well with existing controllers
 and stacks.

 The limitations are 'ip' only supports a single MAC address per VF and
 'tc' doesn't work on VF ports because when the VF is assigned to a VM
 or namespace we lose visibility of it. Providing a VF rep for this
 solves both of those problems.

 In this new mode the default policy is to create a default miss rule
 and implement no l2 forwarding rules. Unfortunately not 

Re: [PATCH 1/1] qlcnic: add wmb() call in transmit data path.

2016-06-30 Thread Lino Sanfilippo



On 30.06.2016 17:32, Lino Sanfilippo wrote:

Hi,

On 29.06.2016 23:51, Sony Chacko wrote:


+/* Ensure writes are complete before HW fetches Tx descriptors */
+wmb();
  qlcnic_update_cmd_producer(tx_ring);

  return NETDEV_TX_OK;



Would not an mmiowb be more appropriate in this case?

Regards,
Lino


Sorry, this was nonsense. This should be "dma_wmb" not "mmiowb".

Regards,
Lino


  1   2   3   >