Re: [RFC v2 03/10] IB/hfi-vnic: Virtual Network Interface Controller (VNIC) netdev

2016-12-18 Thread Vishwanathapura, Niranjana

On Thu, Dec 15, 2016 at 09:24:20PM -0700, Jason Gunthorpe wrote:

>>+struct __hfi_vesw_info {
>>+   u16  fabric_id;
>>+   u16  vesw_id;
>>+
>>+   u8   rsvd0[6];
>>+   u16  def_port_mask;
>>+
>>+   u8   rsvd1[2];
>>+   u16  pkey;
>>+
>>+   u8   rsvd2[4];
>>+   u32  u_mcast_dlid;
>>+   u32  u_ucast_dlid[HFI_VESW_MAX_NUM_DEF_PORT];
>>+
>>+   u8   rsvd3[44];
>>+   u16  eth_mtu[HFI_VNIC_MAX_NUM_PCP];
>>+   u16  eth_mtu_non_vlan;
>>+   u8   rsvd4[2];
>>+} __packed;
>
>This goes on the network too? Also looks like it has endian problems.
>
>Ditto for all the __packed structures.
>

This is in CPU format. There is a separate big endian version of
this


Why are CPU handled structures packed and full of reserved fields?
Don't pack them if they are not pushed out to the network..

There were lots of __packed structures, any that go on the network
need be/le annoations.



Well, driver treats the reserved fields to be sticky. ie., information
block returned (upon GET) to EM is not changed (from SET) except few fields 
which driver is expected to modify.
Structures that go on wire are big endian __packed structures in 
hfi_vnic_encap.h. Ok, I will remove the __packed attribute from CPU handled 
structures here.


Niranjana


Jason


RE: [PATCH 3/3] net: fec: Fix typo in error msg and comment

2016-12-18 Thread Andy Duan
From: Peter Meerwald-Stadler  Sent: Friday, December 16, 
2016 9:24 PM
>To: netdev@vger.kernel.org
>Cc: Peter Meerwald-Stadler ; Andy Duan
>; triv...@kernel.org
>Subject: [PATCH 3/3] net: fec: Fix typo in error msg and comment
>
>Signed-off-by: Peter Meerwald-Stadler 
>Cc: Fugang Duan 
>Cc: triv...@kernel.org
>---
> drivers/net/ethernet/freescale/fec_main.c |4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>

Acked-by: Fugang Duan 

>diff --git a/drivers/net/ethernet/freescale/fec_main.c
>b/drivers/net/ethernet/freescale/fec_main.c
>index 38160c2..250b790 100644
>--- a/drivers/net/ethernet/freescale/fec_main.c
>+++ b/drivers/net/ethernet/freescale/fec_main.c
>@@ -2001,7 +2001,7 @@ static int fec_enet_mii_init(struct platform_device
>*pdev)
>   mii_speed--;
>   if (mii_speed > 63) {
>   dev_err(>dev,
>-  "fec clock (%lu) to fast to get right mii speed\n",
>+  "fec clock (%lu) too fast to get right mii speed\n",
>   clk_get_rate(fep->clk_ipg));
>   err = -EINVAL;
>   goto err_out;
>@@ -2950,7 +2950,7 @@ static void set_multicast_list(struct net_device
>*ndev)
>   }
>
>   /* only upper 6 bits (FEC_HASH_BITS) are used
>-   * which point to specific bit in he hash registers
>+   * which point to specific bit in the hash registers
>*/
>   hash = (crc >> (32 - FEC_HASH_BITS)) & 0x3f;
>
>--
>1.7.10.4


Re: [PATCH iproute2 v2 1/3] ifstat: Add extended statistics to ifstat

2016-12-18 Thread Or Gerlitz
On Thu, Dec 15, 2016 at 3:00 PM, Nogah Frankel  wrote:

> +/* Note: if one xstat name in subset of another, it should be before it in 
> this
> + * list.
> + * Name length must be under 64 chars.
> + */

nit "in subset" --> "is subset" ?


[PATCH net-next 1/1] driver: ipvlan: Define common functions to decrease duplicated codes used to add or del IP address

2016-12-18 Thread fgao
From: Gao Feng 

There are some duplicated codes in ipvlan_add_addr6/4 and
ipvlan_del_addr6/4. Now define two common functions ipvlan_add_addr
and ipvlan_del_addr to decrease the duplicated codes.
It could be helful to maintain the codes.

Signed-off-by: Gao Feng 
---
 It is sent again because the first email is sent during net-next closing.

 drivers/net/ipvlan/ipvlan_main.c | 68 +---
 1 file changed, 29 insertions(+), 39 deletions(-)

diff --git a/drivers/net/ipvlan/ipvlan_main.c b/drivers/net/ipvlan/ipvlan_main.c
index 693ec5b..5874d30 100644
--- a/drivers/net/ipvlan/ipvlan_main.c
+++ b/drivers/net/ipvlan/ipvlan_main.c
@@ -669,23 +669,22 @@ static int ipvlan_device_event(struct notifier_block 
*unused,
return NOTIFY_DONE;
 }
 
-static int ipvlan_add_addr6(struct ipvl_dev *ipvlan, struct in6_addr *ip6_addr)
+static int ipvlan_add_addr(struct ipvl_dev *ipvlan, void *iaddr, bool is_v6)
 {
struct ipvl_addr *addr;
 
-   if (ipvlan_addr_busy(ipvlan->port, ip6_addr, true)) {
-   netif_err(ipvlan, ifup, ipvlan->dev,
- "Failed to add IPv6=%pI6c addr for %s intf\n",
- ip6_addr, ipvlan->dev->name);
-   return -EINVAL;
-   }
addr = kzalloc(sizeof(struct ipvl_addr), GFP_ATOMIC);
if (!addr)
return -ENOMEM;
 
addr->master = ipvlan;
-   memcpy(>ip6addr, ip6_addr, sizeof(struct in6_addr));
-   addr->atype = IPVL_IPV6;
+   if (is_v6) {
+   memcpy(>ip6addr, iaddr, sizeof(struct in6_addr));
+   addr->atype = IPVL_IPV6;
+   } else {
+   memcpy(>ip4addr, iaddr, sizeof(struct in_addr));
+   addr->atype = IPVL_IPV4;
+   }
list_add_tail(>anode, >addrs);
 
/* If the interface is not up, the address will be added to the hash
@@ -697,11 +696,11 @@ static int ipvlan_add_addr6(struct ipvl_dev *ipvlan, 
struct in6_addr *ip6_addr)
return 0;
 }
 
-static void ipvlan_del_addr6(struct ipvl_dev *ipvlan, struct in6_addr 
*ip6_addr)
+static void ipvlan_del_addr(struct ipvl_dev *ipvlan, void *iaddr, bool is_v6)
 {
struct ipvl_addr *addr;
 
-   addr = ipvlan_find_addr(ipvlan, ip6_addr, true);
+   addr = ipvlan_find_addr(ipvlan, iaddr, is_v6);
if (!addr)
return;
 
@@ -712,6 +711,23 @@ static void ipvlan_del_addr6(struct ipvl_dev *ipvlan, 
struct in6_addr *ip6_addr)
return;
 }
 
+static int ipvlan_add_addr6(struct ipvl_dev *ipvlan, struct in6_addr *ip6_addr)
+{
+   if (ipvlan_addr_busy(ipvlan->port, ip6_addr, true)) {
+   netif_err(ipvlan, ifup, ipvlan->dev,
+ "Failed to add IPv6=%pI6c addr for %s intf\n",
+ ip6_addr, ipvlan->dev->name);
+   return -EINVAL;
+   }
+
+   return ipvlan_add_addr(ipvlan, ip6_addr, true);
+}
+
+static void ipvlan_del_addr6(struct ipvl_dev *ipvlan, struct in6_addr 
*ip6_addr)
+{
+   return ipvlan_del_addr(ipvlan, ip6_addr, true);
+}
+
 static int ipvlan_addr6_event(struct notifier_block *unused,
  unsigned long event, void *ptr)
 {
@@ -745,45 +761,19 @@ static int ipvlan_addr6_event(struct notifier_block 
*unused,
 
 static int ipvlan_add_addr4(struct ipvl_dev *ipvlan, struct in_addr *ip4_addr)
 {
-   struct ipvl_addr *addr;
-
if (ipvlan_addr_busy(ipvlan->port, ip4_addr, false)) {
netif_err(ipvlan, ifup, ipvlan->dev,
  "Failed to add IPv4=%pI4 on %s intf.\n",
  ip4_addr, ipvlan->dev->name);
return -EINVAL;
}
-   addr = kzalloc(sizeof(struct ipvl_addr), GFP_KERNEL);
-   if (!addr)
-   return -ENOMEM;
-
-   addr->master = ipvlan;
-   memcpy(>ip4addr, ip4_addr, sizeof(struct in_addr));
-   addr->atype = IPVL_IPV4;
-   list_add_tail(>anode, >addrs);
-
-   /* If the interface is not up, the address will be added to the hash
-* list by ipvlan_open.
-*/
-   if (netif_running(ipvlan->dev))
-   ipvlan_ht_addr_add(ipvlan, addr);
 
-   return 0;
+   return ipvlan_add_addr(ipvlan, ip4_addr, false);
 }
 
 static void ipvlan_del_addr4(struct ipvl_dev *ipvlan, struct in_addr *ip4_addr)
 {
-   struct ipvl_addr *addr;
-
-   addr = ipvlan_find_addr(ipvlan, ip4_addr, false);
-   if (!addr)
-   return;
-
-   ipvlan_ht_addr_del(addr);
-   list_del(>anode);
-   kfree_rcu(addr, rcu);
-
-   return;
+   return ipvlan_del_addr(ipvlan, ip4_addr, false);
 }
 
 static int ipvlan_addr4_event(struct notifier_block *unused,
-- 
1.9.1




[PATCH net-next 1/1] driver: ipvlan: Remove unnecessary ipvlan NULL check in ipvlan_count_rx

2016-12-18 Thread fgao
From: Gao Feng 

There are three functions which would invoke the ipvlan_count_rx. They
are ipvlan_process_multicast, ipvlan_rcv_frame, and ipvlan_nf_input.
The former two functions already use the ipvlan directly before
ipvlan_count_rx, and ipvlan_nf_input gets the ipvlan from
ipvl_addr->master, it is not possible to be NULL too.
So the ipvlan pointer check is unnecessary in ipvlan_count_rx.

Signed-off-by: Gao Feng 
---
 drivers/net/ipvlan/ipvlan_core.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/net/ipvlan/ipvlan_core.c b/drivers/net/ipvlan/ipvlan_core.c
index b4e9907..082f9f1 100644
--- a/drivers/net/ipvlan/ipvlan_core.c
+++ b/drivers/net/ipvlan/ipvlan_core.c
@@ -19,9 +19,6 @@ void ipvlan_init_secret(void)
 static void ipvlan_count_rx(const struct ipvl_dev *ipvlan,
unsigned int len, bool success, bool mcast)
 {
-   if (!ipvlan)
-   return;
-
if (likely(success)) {
struct ipvl_pcpu_stats *pcptr;
 
-- 
1.9.1




Re: [PATCH net] netfilter: check duplicate config when initializing in ipt_CLUSTERIP

2016-12-18 Thread Marcelo Ricardo Leitner
On Thu, Dec 15, 2016 at 12:31:40PM +0800, Xin Long wrote:
> Now when adding an ipt_CLUSTERIP rule, it only checks duplicate config in
> clusterip_config_find_get(). But after that, there may be still another
> thread to insert a config with the same ip, then it leaves proc_create_data
> to do duplicate check.
> 
> It's more reasonable to check duplicate config by ipt_CLUSTERIP itself,
> instead of checking it by proc fs duplicate file check. Before, when proc
> fs allowed duplicate name files in a directory, It could even crash kernel
> because of use-after-free.
> 
> This patch is to check duplicate config under the protection of clusterip
> net lock when initializing a new config.
> 
> Note that it also moves proc file node creation after adding new config, as
> proc_create_data may sleep, it couldn't be called under the clusterip_net
> lock. clusterip_config_find_get returns NULL if c->pde is null to make sure
> it can't be used until the proc file node creation is done.
> 
> Suggested-by: Marcelo Ricardo Leitner 
> Signed-off-by: Xin Long 

Reviewed-by: Marcelo Ricardo Leitner 

> ---
>  net/ipv4/netfilter/ipt_CLUSTERIP.c | 26 +++---
>  1 file changed, 19 insertions(+), 7 deletions(-)
> 
> diff --git a/net/ipv4/netfilter/ipt_CLUSTERIP.c 
> b/net/ipv4/netfilter/ipt_CLUSTERIP.c
> index 21db00d..0e71cac 100644
> --- a/net/ipv4/netfilter/ipt_CLUSTERIP.c
> +++ b/net/ipv4/netfilter/ipt_CLUSTERIP.c
> @@ -144,7 +144,7 @@ clusterip_config_find_get(struct net *net, __be32 
> clusterip, int entry)
>   rcu_read_lock_bh();
>   c = __clusterip_config_find(net, clusterip);
>   if (c) {
> - if (unlikely(!atomic_inc_not_zero(>refcount)))
> + if (!c->pde || unlikely(!atomic_inc_not_zero(>refcount)))
>   c = NULL;
>   else if (entry)
>   atomic_inc(>entries);
> @@ -166,10 +166,11 @@ clusterip_config_init_nodelist(struct clusterip_config 
> *c,
>  
>  static struct clusterip_config *
>  clusterip_config_init(const struct ipt_clusterip_tgt_info *i, __be32 ip,
> - struct net_device *dev)
> +   struct net_device *dev)
>  {
> + struct net *net = dev_net(dev);
>   struct clusterip_config *c;
> - struct clusterip_net *cn = net_generic(dev_net(dev), clusterip_net_id);
> + struct clusterip_net *cn = net_generic(net, clusterip_net_id);
>  
>   c = kzalloc(sizeof(*c), GFP_ATOMIC);
>   if (!c)
> @@ -185,6 +186,17 @@ clusterip_config_init(const struct 
> ipt_clusterip_tgt_info *i, __be32 ip,
>   atomic_set(>refcount, 1);
>   atomic_set(>entries, 1);
>  
> + spin_lock_bh(>lock);
> + if (__clusterip_config_find(net, ip)) {
> + spin_unlock_bh(>lock);
> + kfree(c);
> +
> + return NULL;
> + }
> +
> + list_add_rcu(>list, >configs);
> + spin_unlock_bh(>lock);
> +
>  #ifdef CONFIG_PROC_FS
>   {
>   char buffer[16];
> @@ -195,16 +207,16 @@ clusterip_config_init(const struct 
> ipt_clusterip_tgt_info *i, __be32 ip,
> cn->procdir,
> _proc_fops, c);
>   if (!c->pde) {
> + spin_lock_bh(>lock);
> + list_del_rcu(>list);
> + spin_unlock_bh(>lock);
>   kfree(c);
> +
>   return NULL;
>   }
>   }
>  #endif
>  
> - spin_lock_bh(>lock);
> - list_add_rcu(>list, >configs);
> - spin_unlock_bh(>lock);
> -
>   return c;
>  }
>  
> -- 
> 2.1.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


RE: [PATCH iproute2 v2 1/3] ifstat: Add extended statistics to ifstat

2016-12-18 Thread Nogah Frankel
> -Original Message-
> From: Rami Rosen [mailto:roszenr...@gmail.com]
> Sent: Friday, December 16, 2016 5:21 PM
> To: Nogah Frankel 
> Cc: Stephen Hemminger ; netdev@vger.kernel.org;
> ro...@cumulusnetworks.com; Jiri Pirko ; Elad Raz
> ; Yotam Gigi ; Ido Schimmel
> ; Or Gerlitz 
> Subject: Re: [PATCH iproute2 v2 1/3] ifstat: Add extended statistics to ifstat
> 
> Hi,
> 
> >Thanks, I'll fix it.
> Another minor nit, on this occasion:
> 
> bool is_extanded should be: bool is_extended
> 
> Regards,
> Rami Rosen

Thank you, I will fix it.

Nogah


[PATCH] at86rf230: Allow slow GPIO pins for "rstn"

2016-12-18 Thread Andrey Smirnov
Driver code never touches "rstn" signal in atomic context, so there's
no need to implicitly put such restriction on it by using gpio_set_value
to manipulate it. Replace gpio_set_value to gpio_set_value_cansleep to
fix that.

As a an example of where such restriction might be inconvenient,
consider a hardware design where "rstn" is connected to a pin of I2C/SPI
GPIO expander chip.

Cc: Chris Healy 
Signed-off-by: Andrey Smirnov 
---
 drivers/net/ieee802154/at86rf230.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ieee802154/at86rf230.c 
b/drivers/net/ieee802154/at86rf230.c
index 9f10da6..7700690 100644
--- a/drivers/net/ieee802154/at86rf230.c
+++ b/drivers/net/ieee802154/at86rf230.c
@@ -1710,9 +1710,9 @@ static int at86rf230_probe(struct spi_device *spi)
/* Reset */
if (gpio_is_valid(rstn)) {
udelay(1);
-   gpio_set_value(rstn, 0);
+   gpio_set_value_cansleep(rstn, 0);
udelay(1);
-   gpio_set_value(rstn, 1);
+   gpio_set_value_cansleep(rstn, 1);
usleep_range(120, 240);
}
 
-- 
2.5.5



Re: regression: ath_tx_edma_tasklet() Illegal idle entry in RCU read-side critical section

2016-12-18 Thread Gabriel C



On 18.12.2016 21:14, Paul E. McKenney wrote:

On Sun, Dec 18, 2016 at 07:57:42PM +, Valo, Kalle wrote:

Tobias Klausmann  writes:


A patch for this is already floating on the ML for a while now latest:
(ath9k: do not return early to fix rcu unlocking)


It's here:

https://patchwork.kernel.org/patch/9472709/


Feel free to add:

Acked-by: Paul E. McKenney 



This patch fixes the bug for me.

You can add my Tested-by if you wish.

Tested-by: Gabriel Craciunescu 


Regards,

Gabriel


Re: [PATCH v3] net: macb: Added PCI wrapper for Platform Driver.

2016-12-18 Thread Andy Shevchenko
On Wed, Dec 14, 2016 at 8:39 AM, Bartosz Folta  wrote:
> There are hardware PCI implementations of Cadence GEM network
> controller. This patch will allow to use such hardware with reuse of
> existing Platform Driver.

Since it's already applied, perhaps you would consider addressing my
below comments as follow up clean up.

> +++ b/drivers/net/ethernet/cadence/macb_pci.c
> @@ -0,0 +1,153 @@
> +/**

> + * macb_pci.c - Cadence GEM PCI wrapper.

If at some point file is renamed it will be a bit more work to change
this line. Perhaps drop file name?

> +static int macb_probe(struct pci_dev *pdev, const struct pci_device_id *id)
> +{
> +   int err;
> +   struct platform_device *plat_dev;
> +   struct platform_device_info plat_info;
> +   struct macb_platform_data plat_data;
> +   struct resource res[2];
> +

> +   /* sanity check */
> +   if (!id)
> +   return -EINVAL;

Is it even possible in modern kernels?

> +
> +   /* enable pci device */
> +   err = pci_enable_device(pdev);

pcim_enable_device();

> +   if (err < 0) {
> +   dev_err(>dev, "Enabling PCI device has failed: 0x%04X",

%x for error code?! Confusing.

> +   err);

> +   return -EACCES;

return err;

> +   }
> +
> +   pci_set_master(pdev);
> +
> +   /* set up resources */
> +   memset(res, 0x00, sizeof(struct resource) * ARRAY_SIZE(res));

> +   res[0].start = pdev->resource[0].start;
> +   res[0].end = pdev->resource[0].end;

pci_resource_start()
pci_resource_end()

> +   res[0].name = PCI_DRIVER_NAME;
> +   res[0].flags = IORESOURCE_MEM;

> +   res[1].start = pdev->irq;

Btw, does it support MSI? If so and are planning to use it,
pci_irq_...() API would help here.

> +   res[1].name = PCI_DRIVER_NAME;
> +   res[1].flags = IORESOURCE_IRQ;
> +
> +   dev_info(>dev, "EMAC physical base addr = 0x%p\n",

> +(void *)(uintptr_t)pci_resource_start(pdev, 0));

Oh, %pa

> +
> +   /* set up macb platform data */
> +   memset(_data, 0, sizeof(plat_data));
> +
> +   /* initialize clocks */
> +   plat_data.pclk = clk_register_fixed_rate(>dev, "pclk", NULL, 0,
> +GEM_PCLK_RATE);
> +   if (IS_ERR(plat_data.pclk)) {
> +   err = PTR_ERR(plat_data.pclk);
> +   goto err_pclk_register;
> +   }
> +
> +   plat_data.hclk = clk_register_fixed_rate(>dev, "hclk", NULL, 0,
> +GEM_HCLK_RATE);
> +   if (IS_ERR(plat_data.hclk)) {
> +   err = PTR_ERR(plat_data.hclk);
> +   goto err_hclk_register;
> +   }
> +
> +   /* set up platform device info */
> +   memset(_info, 0, sizeof(plat_info));
> +   plat_info.parent = >dev;
> +   plat_info.fwnode = pdev->dev.fwnode;
> +   plat_info.name = PLAT_DRIVER_NAME;
> +   plat_info.id = pdev->devfn;
> +   plat_info.res = res;
> +   plat_info.num_res = ARRAY_SIZE(res);
> +   plat_info.data = _data;
> +   plat_info.size_data = sizeof(plat_data);
> +   plat_info.dma_mask = DMA_BIT_MASK(32);
> +
> +   /* register platform device */
> +   plat_dev = platform_device_register_full(_info);
> +   if (IS_ERR(plat_dev)) {
> +   err = PTR_ERR(plat_dev);
> +   goto err_plat_dev_register;
> +   }
> +
> +   pci_set_drvdata(pdev, plat_dev);
> +
> +   return 0;
> +
> +err_plat_dev_register:
> +   clk_unregister(plat_data.hclk);
> +
> +err_hclk_register:
> +   clk_unregister(plat_data.pclk);
> +

> +err_pclk_register:
> +   pci_disable_device(pdev);

Redundant when use pcim_enable_device().

> +   return err;
> +}
> +
> +static void macb_remove(struct pci_dev *pdev)
> +{
> +   struct platform_device *plat_dev = pci_get_drvdata(pdev);
> +   struct macb_platform_data *plat_data = 
> dev_get_platdata(_dev->dev);
> +
> +   platform_device_unregister(plat_dev);

> +   pci_disable_device(pdev);

Ditto.

> +   clk_unregister(plat_data->pclk);
> +   clk_unregister(plat_data->hclk);
> +}


-- 
With Best Regards,
Andy Shevchenko


[PATCH RFC net-next 1/7] sock: add sk_dst_pending_confirm flag

2016-12-18 Thread Julian Anastasov
Add new sock flag to allow sockets to confirm neighbour.
When same struct dst_entry can be used for many different
neighbours we can not use it for pending confirmations.
As not all call paths lock the socket use full word for
the flag.

Add sk_dst_confirm as replacement for dst_confirm when
called for received packets.

Signed-off-by: Julian Anastasov 
---
 include/net/sock.h | 14 +-
 net/core/sock.c|  2 ++
 2 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index 282d065..e83bb01 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -239,6 +239,7 @@ struct sock_common {
   *@sk_wq: sock wait queue and async head
   *@sk_rx_dst: receive input route used by early demux
   *@sk_dst_cache: destination cache
+  *@sk_dst_pending_confirm: need to confirm neighbour
   *@sk_policy: flow policy
   *@sk_receive_queue: incoming packets
   *@sk_wmem_alloc: transmit queue bytes committed
@@ -381,7 +382,7 @@ struct sock {
 #endif
struct dst_entry*sk_rx_dst;
struct dst_entry __rcu  *sk_dst_cache;
-   atomic_tsk_omem_alloc;
+   int sk_dst_pending_confirm;
int sk_sndbuf;
 
/* = cache line for TX = */
@@ -405,6 +406,8 @@ struct sock {
unsigned intsk_gso_max_size;
gfp_t   sk_allocation;
__u32   sk_txhash;
+   atomic_tsk_omem_alloc;
+   /* Note: 32bit hole on 64bit arches */
 
/*
 * Because of non atomicity rules, all
@@ -1761,6 +1764,7 @@ static inline void dst_negative_advice(struct sock *sk)
if (ndst != dst) {
rcu_assign_pointer(sk->sk_dst_cache, ndst);
sk_tx_queue_clear(sk);
+   sk->sk_dst_pending_confirm = 0;
}
}
 }
@@ -1771,6 +1775,7 @@ static inline void dst_negative_advice(struct sock *sk)
struct dst_entry *old_dst;
 
sk_tx_queue_clear(sk);
+   sk->sk_dst_pending_confirm = 0;
/*
 * This can be called while sk is owned by the caller only,
 * with no state that can be checked in a rcu_dereference_check() cond
@@ -1786,6 +1791,7 @@ static inline void dst_negative_advice(struct sock *sk)
struct dst_entry *old_dst;
 
sk_tx_queue_clear(sk);
+   sk->sk_dst_pending_confirm = 0;
old_dst = xchg((__force struct dst_entry **)>sk_dst_cache, dst);
dst_release(old_dst);
 }
@@ -1806,6 +1812,12 @@ static inline void dst_negative_advice(struct sock *sk)
 
 struct dst_entry *sk_dst_check(struct sock *sk, u32 cookie);
 
+static inline void sk_dst_confirm(struct sock *sk)
+{
+   if (!sk->sk_dst_pending_confirm)
+   sk->sk_dst_pending_confirm = 1;
+}
+
 bool sk_mc_loop(struct sock *sk);
 
 static inline bool sk_can_gso(const struct sock *sk)
diff --git a/net/core/sock.c b/net/core/sock.c
index 9fa46b9..8af5296 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -502,6 +502,7 @@ struct dst_entry *__sk_dst_check(struct sock *sk, u32 
cookie)
 
if (dst && dst->obsolete && dst->ops->check(dst, cookie) == NULL) {
sk_tx_queue_clear(sk);
+   sk->sk_dst_pending_confirm = 0;
RCU_INIT_POINTER(sk->sk_dst_cache, NULL);
dst_release(dst);
return NULL;
@@ -1522,6 +1523,7 @@ struct sock *sk_clone_lock(const struct sock *sk, const 
gfp_t priority)
af_family_clock_key_strings[newsk->sk_family]);
 
newsk->sk_dst_cache = NULL;
+   newsk->sk_dst_pending_confirm = 0;
newsk->sk_wmem_queued   = 0;
newsk->sk_forward_alloc = 0;
atomic_set(>sk_drops, 0);
-- 
1.9.3



[PATCH RFC net-next 3/7] sctp: add dst_pending_confirm flag

2016-12-18 Thread Julian Anastasov
Add new transport flag to allow sockets to confirm neighbour.
When same struct dst_entry can be used for many different
neighbours we can not use it for pending confirmations.
The flag is propagated from transport to every packet.
It is reset when cached dst is reset.

Reported-by: YueHaibing 
Fixes: 5110effee8fd ("net: Do delayed neigh confirmation.")
Fixes: f2bb4bedf35d ("ipv4: Cache output routes in fib_info nexthops.")
Signed-off-by: Julian Anastasov 
---
 include/net/sctp/sctp.h|  6 ++
 include/net/sctp/structs.h |  4 
 net/sctp/associola.c   |  3 +--
 net/sctp/output.c  | 10 +-
 net/sctp/outqueue.c|  2 +-
 net/sctp/sm_make_chunk.c   |  6 ++
 net/sctp/sm_sideeffect.c   |  2 +-
 net/sctp/socket.c  |  4 ++--
 net/sctp/transport.c   | 18 +-
 9 files changed, 39 insertions(+), 16 deletions(-)

diff --git a/include/net/sctp/sctp.h b/include/net/sctp/sctp.h
index f0dcaeb..85d9083 100644
--- a/include/net/sctp/sctp.h
+++ b/include/net/sctp/sctp.h
@@ -586,10 +586,8 @@ static inline void sctp_v4_map_v6(union sctp_addr *addr)
  */
 static inline struct dst_entry *sctp_transport_dst_check(struct sctp_transport 
*t)
 {
-   if (t->dst && !dst_check(t->dst, t->dst_cookie)) {
-   dst_release(t->dst);
-   t->dst = NULL;
-   }
+   if (t->dst && !dst_check(t->dst, t->dst_cookie))
+   sctp_transport_dst_release(t);
 
return t->dst;
 }
diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
index 92daabd..e842e84 100644
--- a/include/net/sctp/structs.h
+++ b/include/net/sctp/structs.h
@@ -838,6 +838,8 @@ struct sctp_transport {
 
__u32 burst_limited;/* Holds old cwnd when max.burst is applied */
 
+   __u32 dst_pending_confirm;  /* need to confirm neighbour */
+
/* Destination */
struct dst_entry *dst;
/* Source address. */
@@ -980,6 +982,8 @@ void sctp_transport_route(struct sctp_transport *, union 
sctp_addr *,
 void sctp_transport_reset(struct sctp_transport *);
 void sctp_transport_update_pmtu(struct sock *, struct sctp_transport *, u32);
 void sctp_transport_immediate_rtx(struct sctp_transport *);
+void sctp_transport_dst_release(struct sctp_transport *t);
+void sctp_transport_dst_confirm(struct sctp_transport *t);
 
 
 /* This is the structure we use to queue packets as they come into
diff --git a/net/sctp/associola.c b/net/sctp/associola.c
index 68428e1..7bd26e0 100644
--- a/net/sctp/associola.c
+++ b/net/sctp/associola.c
@@ -820,8 +820,7 @@ void sctp_assoc_control_transport(struct sctp_association 
*asoc,
if (transport->state != SCTP_UNCONFIRMED)
transport->state = SCTP_INACTIVE;
else {
-   dst_release(transport->dst);
-   transport->dst = NULL;
+   sctp_transport_dst_release(transport);
ulp_notify = false;
}
 
diff --git a/net/sctp/output.c b/net/sctp/output.c
index f5320a8..4684a00 100644
--- a/net/sctp/output.c
+++ b/net/sctp/output.c
@@ -550,6 +550,7 @@ int sctp_packet_transmit(struct sctp_packet *packet, gfp_t 
gfp)
struct sctp_association *asoc = tp->asoc;
struct sctp_chunk *chunk, *tmp;
int pkt_count, gso = 0;
+   int confirm;
struct dst_entry *dst;
struct sk_buff *head;
struct sctphdr *sh;
@@ -628,7 +629,14 @@ int sctp_packet_transmit(struct sctp_packet *packet, gfp_t 
gfp)
asoc->peer.last_sent_to = tp;
}
head->ignore_df = packet->ipfragok;
-   tp->af_specific->sctp_xmit(head, tp);
+   confirm = tp->dst_pending_confirm;
+   if (confirm)
+   head->dst_pending_confirm = 1;
+   /* neighbour should be confirmed on successful transmission or
+* positive error
+*/
+   if (tp->af_specific->sctp_xmit(head, tp) >= 0 && confirm)
+   tp->dst_pending_confirm = 0;
 
 out:
list_for_each_entry_safe(chunk, tmp, >chunk_list, list) {
diff --git a/net/sctp/outqueue.c b/net/sctp/outqueue.c
index e540826..8234799 100644
--- a/net/sctp/outqueue.c
+++ b/net/sctp/outqueue.c
@@ -1641,7 +1641,7 @@ static void sctp_check_transmitted(struct sctp_outq *q,
 
if (forward_progress) {
if (transport->dst)
-   dst_confirm(transport->dst);
+   sctp_transport_dst_confirm(transport);
}
}
 
diff --git a/net/sctp/sm_make_chunk.c b/net/sctp/sm_make_chunk.c
index 9e9690b..6fb15bf 100644
--- a/net/sctp/sm_make_chunk.c
+++ b/net/sctp/sm_make_chunk.c
@@ -3317,8 +3317,7 @@ static void sctp_asconf_param_success(struct 
sctp_association *asoc,
local_bh_enable();
list_for_each_entry(transport, >peer.transport_addr_list,
transports) {
-   

[PATCH RFC net-next 4/7] tcp: replace dst_confirm with sk_dst_confirm

2016-12-18 Thread Julian Anastasov
When same struct dst_entry can be used for many different
neighbours we can not use it for pending confirmations.
Use the new sk_dst_confirm() helper to propagate the
indication from received packets to sock_confirm_neigh().

Reported-by: YueHaibing 
Fixes: 5110effee8fd ("net: Do delayed neigh confirmation.")
Fixes: f2bb4bedf35d ("ipv4: Cache output routes in fib_info nexthops.")
Signed-off-by: Julian Anastasov 
---
 net/ipv4/tcp_input.c   | 12 +++-
 net/ipv4/tcp_metrics.c |  7 ++-
 net/ipv4/tcp_output.c  |  2 ++
 3 files changed, 7 insertions(+), 14 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 6c79075..005c428 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -3698,11 +3698,8 @@ static int tcp_ack(struct sock *sk, const struct sk_buff 
*skb, int flag)
if (tp->tlp_high_seq)
tcp_process_tlp_ack(sk, ack, flag);
 
-   if ((flag & FLAG_FORWARD_PROGRESS) || !(flag & FLAG_NOT_DUP)) {
-   struct dst_entry *dst = __sk_dst_get(sk);
-   if (dst)
-   dst_confirm(dst);
-   }
+   if ((flag & FLAG_FORWARD_PROGRESS) || !(flag & FLAG_NOT_DUP))
+   sk_dst_confirm(sk);
 
if (icsk->icsk_pending == ICSK_TIME_RETRANS)
tcp_schedule_loss_probe(sk);
@@ -6022,7 +6019,6 @@ int tcp_rcv_state_process(struct sock *sk, struct sk_buff 
*skb)
break;
 
case TCP_FIN_WAIT1: {
-   struct dst_entry *dst;
int tmo;
 
/* If we enter the TCP_FIN_WAIT1 state and we are a
@@ -6049,9 +6045,7 @@ int tcp_rcv_state_process(struct sock *sk, struct sk_buff 
*skb)
tcp_set_state(sk, TCP_FIN_WAIT2);
sk->sk_shutdown |= SEND_SHUTDOWN;
 
-   dst = __sk_dst_get(sk);
-   if (dst)
-   dst_confirm(dst);
+   sk_dst_confirm(sk);
 
if (!sock_flag(sk, SOCK_DEAD)) {
/* Wake up lingering close() */
diff --git a/net/ipv4/tcp_metrics.c b/net/ipv4/tcp_metrics.c
index d46f4d5..1b335d6 100644
--- a/net/ipv4/tcp_metrics.c
+++ b/net/ipv4/tcp_metrics.c
@@ -375,12 +375,10 @@ void tcp_update_metrics(struct sock *sk)
u32 val;
int m;
 
+   sk_dst_confirm(sk);
if (sysctl_tcp_nometrics_save || !dst)
return;
 
-   if (dst->flags & DST_HOST)
-   dst_confirm(dst);
-
rcu_read_lock();
if (icsk->icsk_backoff || !tp->srtt_us) {
/* This session failed to estimate rtt. Why?
@@ -493,11 +491,10 @@ void tcp_init_metrics(struct sock *sk)
struct tcp_metrics_block *tm;
u32 val, crtt = 0; /* cached RTT scaled by 8 */
 
+   sk_dst_confirm(sk);
if (!dst)
goto reset;
 
-   dst_confirm(dst);
-
rcu_read_lock();
tm = tcp_get_metrics(sk, dst, true);
if (!tm) {
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index b45101f..dc01f35 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -974,6 +974,8 @@ static int tcp_transmit_skb(struct sock *sk, struct sk_buff 
*skb, int clone_it,
skb_set_hash_from_sk(skb, sk);
atomic_add(skb->truesize, >sk_wmem_alloc);
 
+   skb->dst_pending_confirm = sk->sk_dst_pending_confirm;
+
/* Build TCP header and checksum it. */
th = (struct tcphdr *)skb->data;
th->source  = inet->inet_sport;
-- 
1.9.3



[PATCH RFC net-next 5/7] net: add confirm_neigh method to dst_ops

2016-12-18 Thread Julian Anastasov
Add confirm_neigh method to dst_ops and use it from IPv4 and IPv6
to lookup and confirm the neighbour. Its usage via the new helper
dst_confirm_neigh() should be restricted to MSG_PROBE users for
performance reasons.

Signed-off-by: Julian Anastasov 
---
 include/net/arp.h  | 16 
 include/net/dst.h  |  7 +++
 include/net/dst_ops.h  |  2 ++
 include/net/ndisc.h| 17 +
 net/ipv4/route.c   | 19 +++
 net/ipv6/route.c   | 16 
 net/xfrm/xfrm_policy.c | 16 
 7 files changed, 93 insertions(+)

diff --git a/include/net/arp.h b/include/net/arp.h
index 5e0f891..65619a2 100644
--- a/include/net/arp.h
+++ b/include/net/arp.h
@@ -35,6 +35,22 @@ static inline struct neighbour *__ipv4_neigh_lookup(struct 
net_device *dev, u32
return n;
 }
 
+static inline void __ipv4_confirm_neigh(struct net_device *dev, u32 key)
+{
+   struct neighbour *n;
+
+   rcu_read_lock_bh();
+   n = __ipv4_neigh_lookup_noref(dev, key);
+   if (n) {
+   unsigned long now = jiffies;
+
+   /* avoid dirtying neighbour */
+   if (n->confirmed != now)
+   n->confirmed = now;
+   }
+   rcu_read_unlock_bh();
+}
+
 void arp_init(void);
 int arp_ioctl(struct net *net, unsigned int cmd, void __user *arg);
 void arp_send(int type, int ptype, __be32 dest_ip,
diff --git a/include/net/dst.h b/include/net/dst.h
index 6835d22..3a3b34b 100644
--- a/include/net/dst.h
+++ b/include/net/dst.h
@@ -477,6 +477,13 @@ static inline struct neighbour *dst_neigh_lookup_skb(const 
struct dst_entry *dst
return IS_ERR(n) ? NULL : n;
 }
 
+static inline void dst_confirm_neigh(const struct dst_entry *dst,
+const void *daddr)
+{
+   if (dst->ops->confirm_neigh)
+   dst->ops->confirm_neigh(dst, daddr);
+}
+
 static inline void dst_link_failure(struct sk_buff *skb)
 {
struct dst_entry *dst = skb_dst(skb);
diff --git a/include/net/dst_ops.h b/include/net/dst_ops.h
index a0d443c..fc1bdb4 100644
--- a/include/net/dst_ops.h
+++ b/include/net/dst_ops.h
@@ -33,6 +33,8 @@ struct dst_ops {
struct neighbour *  (*neigh_lookup)(const struct dst_entry *dst,
struct sk_buff *skb,
const void *daddr);
+   void(*confirm_neigh)(const struct dst_entry *dst,
+   const void *daddr);
 
struct kmem_cache   *kmem_cachep;
 
diff --git a/include/net/ndisc.h b/include/net/ndisc.h
index d562a2f..8a02146 100644
--- a/include/net/ndisc.h
+++ b/include/net/ndisc.h
@@ -391,6 +391,23 @@ static inline struct neighbour *__ipv6_neigh_lookup(struct 
net_device *dev, cons
return n;
 }
 
+static inline void __ipv6_confirm_neigh(struct net_device *dev,
+   const void *pkey)
+{
+   struct neighbour *n;
+
+   rcu_read_lock_bh();
+   n = __ipv6_neigh_lookup_noref(dev, pkey);
+   if (n) {
+   unsigned long now = jiffies;
+
+   /* avoid dirtying neighbour */
+   if (n->confirmed != now)
+   n->confirmed = now;
+   }
+   rcu_read_unlock_bh();
+}
+
 int ndisc_init(void);
 int ndisc_late_init(void);
 
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index fa5c037..682ff24 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -154,6 +154,7 @@ static u32 *ipv4_cow_metrics(struct dst_entry *dst, 
unsigned long old)
 static struct neighbour *ipv4_neigh_lookup(const struct dst_entry *dst,
   struct sk_buff *skb,
   const void *daddr);
+static void ipv4_confirm_neigh(const struct dst_entry *dst, const void *daddr);
 
 static struct dst_ops ipv4_dst_ops = {
.family =   AF_INET,
@@ -168,6 +169,7 @@ static struct neighbour *ipv4_neigh_lookup(const struct 
dst_entry *dst,
.redirect = ip_do_redirect,
.local_out =__ip_local_out,
.neigh_lookup = ipv4_neigh_lookup,
+   .confirm_neigh =ipv4_confirm_neigh,
 };
 
 #define ECN_OR_COST(class) TC_PRIO_##class
@@ -461,6 +463,23 @@ static struct neighbour *ipv4_neigh_lookup(const struct 
dst_entry *dst,
return neigh_create(_tbl, pkey, dev);
 }
 
+static void ipv4_confirm_neigh(const struct dst_entry *dst, const void *daddr)
+{
+   struct net_device *dev = dst->dev;
+   const __be32 *pkey = daddr;
+   const struct rtable *rt;
+
+   rt = (const struct rtable *)dst;
+   if (rt->rt_gateway)
+   pkey = (const __be32 *)>rt_gateway;
+   else if (!daddr ||
+(rt->rt_flags &
+ (RTCF_MULTICAST | RTCF_BROADCAST | RTCF_LOCAL)))
+   return;
+
+   __ipv4_confirm_neigh(dev, *(__force 

[PATCH RFC net-next 0/7] net: dst_confirm replacement

2016-12-18 Thread Julian Anastasov
This patchset addresses the problem of neighbour
confirmation where received replies from one nexthop
can cause confirmation of different nexthop when using
the same dst. Thanks to YueHaibing 
for tracking the dst->pending_confirm problem.

Sockets can obtain cached output route. Such
routes can be to known nexthop (rt_gateway=IP) or to be
used simultaneously for different nexthop IPs by different
subnet prefixes (nh->nh_scope = RT_SCOPE_HOST, rt_gateway=0).

At first look, there are more problems:

- dst_confirm() sets flag on dst and not on dst->path,
as result, indication is lost when XFRM is used

- DNAT can change the nexthop, so the really used nexthop is
not confirmed

So, the following solution is to avoid using
dst->pending_confirm.

The current dst_confirm() usage is as follows:

Protocols confirming dst on received packets:
- TCP (1 dst per socket)
- SCTP (1 dst per transport)
- CXGB*

Protocols supporting sendmsg with MSG_CONFIRM [ | MSG_PROBE ] to
confirm neighbour:
- UDP IPv4/IPv6
- ICMPv4 PING
- RAW IPv4/IPv6
- L2TP/IPv6

MSG_CONFIRM for other purposes (fix not needed):
- CAN

Sending without locking the socket:
- UDP (when no cork)
- RAW (when hdrincl=1)

Redirects from old to new GW:
- rt6_do_redirect


The patchset includes the following changes:

1. sock: add sk_dst_pending_confirm flag

- used only by TCP with patch 4 to remember the received
indication in sk->sk_dst_pending_confirm

2. net: add dst_pending_confirm flag to skbuff

- skb->dst_pending_confirm will be used by all protocols
in following patches

3. sctp: add dst_pending_confirm flag

- SCTP uses per-transport dsts and can not use
sk->sk_dst_pending_confirm like TCP

4. tcp: replace dst_confirm with sk_dst_confirm

5. net: add confirm_neigh method to dst_ops

- IPv4 and IPv6 provision for slow neigh lookups for MSG_PROBE users.
I decided to use neigh lookup only for this case because on
MSG_PROBE the skb may pass MTU checks but it does not reach
the neigh confirmation code. This patch will be used from patch 6.

- xfrm_confirm_neigh: support is incomplete here, only routes with
known nexthops (gateway) are supported because the tunnel address
is slow to obtain. Or there is solution to this problem?

6. net: use dst_confirm_neigh for UDP, RAW, ICMP, L2TP

- dst_confirm conversion for UDP, RAW, ICMP and L2TP/IPv6

- these protocols use MSG_CONFIRM propagated by ip*_append_data
to skb->dst_pending_confirm. sk->sk_dst_pending_confirm is not
used because some sending paths do not lock the socket. For
MSG_PROBE we use the slow lookup (dst_confirm_neigh).

- there are also 2 cases that need the slow lookup:
__ip6_rt_update_pmtu and rt6_do_redirect. I hope
_hdr(skb)->saddr is the correct nexthop address to use here.

7. net: pending_confirm is not used anymore

- I failed to understand the CXGB* code, I see dst_confirm()
calls but I'm not sure dst_neigh_output() was called. For now
I just removed the dst->pending_confirm flag and left all
dst_confirm() calls there. Any better idea?

- Now may be old function neigh_output() should be restored
instead of dst_neigh_output?


Julian Anastasov (7):
  sock: add sk_dst_pending_confirm flag
  net: add dst_pending_confirm flag to skbuff
  sctp: add dst_pending_confirm flag
  tcp: replace dst_confirm with sk_dst_confirm
  net: add confirm_neigh method to dst_ops
  net: use dst_confirm_neigh for UDP, RAW, ICMP, L2TP
  net: pending_confirm is not used anymore

 drivers/net/vrf.c  |  5 -
 include/linux/skbuff.h |  4 +++-
 include/net/arp.h  | 16 
 include/net/dst.h  | 21 +
 include/net/dst_ops.h  |  2 ++
 include/net/ndisc.h| 17 +
 include/net/sctp/sctp.h|  6 ++
 include/net/sctp/structs.h |  4 
 include/net/sock.h | 28 +++-
 net/core/dst.c |  1 -
 net/core/sock.c|  2 ++
 net/ipv4/ip_output.c   | 11 ++-
 net/ipv4/ping.c|  3 ++-
 net/ipv4/raw.c |  6 +-
 net/ipv4/route.c   | 19 +++
 net/ipv4/tcp_input.c   | 12 +++-
 net/ipv4/tcp_metrics.c |  7 ++-
 net/ipv4/tcp_output.c  |  2 ++
 net/ipv4/udp.c |  3 ++-
 net/ipv6/ip6_output.c  |  7 +++
 net/ipv6/raw.c |  6 +-
 net/ipv6/route.c   | 43 ++-
 net/ipv6/udp.c |  3 ++-
 net/l2tp/l2tp_ip6.c|  3 ++-
 net/sctp/associola.c   |  3 +--
 net/sctp/output.c  | 10 +-
 net/sctp/outqueue.c|  2 +-
 net/sctp/sm_make_chunk.c   |  6 ++
 net/sctp/sm_sideeffect.c   |  2 +-
 net/sctp/socket.c  |  4 ++--
 net/sctp/transport.c   | 18 +-
 net/xfrm/xfrm_policy.c | 16 
 32 files changed, 226 insertions(+), 66 deletions(-)

-- 
1.9.3



[PATCH RFC net-next 6/7] net: use dst_confirm_neigh for UDP, RAW, ICMP, L2TP

2016-12-18 Thread Julian Anastasov
When same struct dst_entry can be used for many different
neighbours we can not use it for pending confirmations.

The datagram protocols can use MSG_CONFIRM to confirm the
neighbour. When used with MSG_PROBE we do not reach the
code where neighbour is confirmed, so we have to do the
same slow lookup by using the dst_confirm_neigh() helper.
When MSG_PROBE is not used, ip_append_data/ip6_append_data
will set the skb flag dst_pending_confirm.

Reported-by: YueHaibing 
Fixes: 5110effee8fd ("net: Do delayed neigh confirmation.")
Fixes: f2bb4bedf35d ("ipv4: Cache output routes in fib_info nexthops.")
Signed-off-by: Julian Anastasov 
---
 net/ipv4/ip_output.c  |  6 ++
 net/ipv4/ping.c   |  3 ++-
 net/ipv4/raw.c|  6 +-
 net/ipv4/udp.c|  3 ++-
 net/ipv6/ip6_output.c |  6 ++
 net/ipv6/raw.c|  6 +-
 net/ipv6/route.c  | 27 ++-
 net/ipv6/udp.c|  3 ++-
 net/l2tp/l2tp_ip6.c   |  3 ++-
 9 files changed, 44 insertions(+), 19 deletions(-)

diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index fbe63cc..3afab90 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -889,6 +889,9 @@ static inline int ip_ufo_append_data(struct sock *sk,
 
skb->csum = 0;
 
+   if (flags & MSG_CONFIRM)
+   skb->dst_pending_confirm = 1;
+
__skb_queue_tail(queue, skb);
} else if (skb_is_gso(skb)) {
goto append;
@@ -1089,6 +1092,9 @@ static int __ip_append_data(struct sock *sk,
exthdrlen = 0;
csummode = CHECKSUM_NONE;
 
+   if ((flags & MSG_CONFIRM) && !skb_prev)
+   skb->dst_pending_confirm = 1;
+
/*
 * Put the packet on the pending queue.
 */
diff --git a/net/ipv4/ping.c b/net/ipv4/ping.c
index 86cca61..f3f6f50 100644
--- a/net/ipv4/ping.c
+++ b/net/ipv4/ping.c
@@ -848,7 +848,8 @@ static int ping_v4_sendmsg(struct sock *sk, struct msghdr 
*msg, size_t len)
return err;
 
 do_confirm:
-   dst_confirm(>dst);
+   if (msg->msg_flags & MSG_PROBE)
+   dst_confirm_neigh(>dst, );
if (!(msg->msg_flags & MSG_PROBE) || len)
goto back_from_confirm;
err = 0;
diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c
index 2300fae..47c5451 100644
--- a/net/ipv4/raw.c
+++ b/net/ipv4/raw.c
@@ -383,6 +383,9 @@ static int raw_send_hdrinc(struct sock *sk, struct flowi4 
*fl4,
 
sock_tx_timestamp(sk, sockc->tsflags, _shinfo(skb)->tx_flags);
 
+   if (flags & MSG_CONFIRM)
+   skb->dst_pending_confirm = 1;
+
skb->transport_header = skb->network_header;
err = -EFAULT;
if (memcpy_from_msg(iph, msg, length))
@@ -666,7 +669,8 @@ static int raw_sendmsg(struct sock *sk, struct msghdr *msg, 
size_t len)
return len;
 
 do_confirm:
-   dst_confirm(>dst);
+   if (msg->msg_flags & MSG_PROBE)
+   dst_confirm_neigh(>dst, );
if (!(msg->msg_flags & MSG_PROBE) || len)
goto back_from_confirm;
err = 0;
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 9ca279b..22a1eb1 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1113,7 +1113,8 @@ int udp_sendmsg(struct sock *sk, struct msghdr *msg, 
size_t len)
return err;
 
 do_confirm:
-   dst_confirm(>dst);
+   if (msg->msg_flags & MSG_PROBE)
+   dst_confirm_neigh(>dst, >daddr);
if (!(msg->msg_flags_PROBE) || len)
goto back_from_confirm;
err = 0;
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 285aa9f..efe8b3f 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -1145,6 +1145,9 @@ static inline int ip6_ufo_append_data(struct sock *sk,
skb->protocol = htons(ETH_P_IPV6);
skb->csum = 0;
 
+   if (flags & MSG_CONFIRM)
+   skb->dst_pending_confirm = 1;
+
__skb_queue_tail(queue, skb);
} else if (skb_is_gso(skb)) {
goto append;
@@ -1517,6 +1520,9 @@ static int __ip6_append_data(struct sock *sk,
exthdrlen = 0;
dst_exthdrlen = 0;
 
+   if ((flags & MSG_CONFIRM) && !skb_prev)
+   skb->dst_pending_confirm = 1;
+
/*
 * Put the packet on the pending queue
 */
diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c
index 291ebc2..0597548 100644
--- a/net/ipv6/raw.c
+++ b/net/ipv6/raw.c
@@ -650,6 +650,9 @@ static int rawv6_send_hdrinc(struct sock *sk, struct msghdr 
*msg, int length,
 
skb->ip_summed = CHECKSUM_NONE;
 
+   if (flags & MSG_CONFIRM)
+   skb->dst_pending_confirm = 1;
+
skb->transport_header = skb->network_header;
err = 

[PATCH RFC net-next 7/7] net: pending_confirm is not used anymore

2016-12-18 Thread Julian Anastasov
When same struct dst_entry can be used for many different
neighbours we can not use it for pending confirmations.
As last step, we can remove the pending_confirm flag.

Reported-by: YueHaibing 
Fixes: 5110effee8fd ("net: Do delayed neigh confirmation.")
Fixes: f2bb4bedf35d ("ipv4: Cache output routes in fib_info nexthops.")
Signed-off-by: Julian Anastasov 
---
 include/net/dst.h | 14 ++
 net/core/dst.c|  1 -
 2 files changed, 2 insertions(+), 13 deletions(-)

diff --git a/include/net/dst.h b/include/net/dst.h
index 3a3b34b..84a1043 100644
--- a/include/net/dst.h
+++ b/include/net/dst.h
@@ -59,8 +59,6 @@ struct dst_entry {
 #define DST_XFRM_QUEUE 0x0100
 #define DST_METADATA   0x0200
 
-   unsigned short  pending_confirm;
-
short   error;
 
/* A non-zero value of dst->obsolete forces by-hand validation
@@ -78,6 +76,8 @@ struct dst_entry {
 #define DST_OBSOLETE_KILL  -2
unsigned short  header_len; /* more space at head required 
*/
unsigned short  trailer_len;/* space to reserve at tail */
+   unsigned short  __pad3;
+
 #ifdef CONFIG_IP_ROUTE_CLASSID
__u32   tclassid;
 #else
@@ -440,7 +440,6 @@ static inline void dst_rcu_free(struct rcu_head *head)
 
 static inline void dst_confirm(struct dst_entry *dst)
 {
-   dst->pending_confirm = 1;
 }
 
 static inline int dst_neigh_output(struct dst_entry *dst, struct neighbour *n,
@@ -448,15 +447,6 @@ static inline int dst_neigh_output(struct dst_entry *dst, 
struct neighbour *n,
 {
const struct hh_cache *hh;
 
-   if (dst->pending_confirm) {
-   unsigned long now = jiffies;
-
-   dst->pending_confirm = 0;
-   /* avoid dirtying neighbour */
-   if (n->confirmed != now)
-   n->confirmed = now;
-   }
-
hh = >hh;
if ((n->nud_state & NUD_CONNECTED) && hh->hh_len)
return neigh_hh_output(hh, skb);
diff --git a/net/core/dst.c b/net/core/dst.c
index b5cbbe0..960e503 100644
--- a/net/core/dst.c
+++ b/net/core/dst.c
@@ -190,7 +190,6 @@ void dst_init(struct dst_entry *dst, struct dst_ops *ops,
dst->__use = 0;
dst->lastuse = jiffies;
dst->flags = flags;
-   dst->pending_confirm = 0;
dst->next = NULL;
if (!(flags & DST_NOCOUNT))
dst_entries_add(ops, 1);
-- 
1.9.3



[PATCH RFC net-next 2/7] net: add dst_pending_confirm flag to skbuff

2016-12-18 Thread Julian Anastasov
Add new skbuff flag to allow protocols to confirm neighbour.
When same struct dst_entry can be used for many different
neighbours we can not use it for pending confirmations.

Add sock_confirm_neigh() helper to confirm the neighbour and
use it for IPv4, IPv6 and VRF before dst_neigh_output.

Signed-off-by: Julian Anastasov 
---
 drivers/net/vrf.c  |  5 -
 include/linux/skbuff.h |  4 +++-
 include/net/sock.h | 14 ++
 net/ipv4/ip_output.c   |  5 -
 net/ipv6/ip6_output.c  |  1 +
 5 files changed, 26 insertions(+), 3 deletions(-)

diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c
index 7532646..b118d2b 100644
--- a/drivers/net/vrf.c
+++ b/drivers/net/vrf.c
@@ -377,6 +377,7 @@ static int vrf_finish_output6(struct net *net, struct sock 
*sk,
if (unlikely(!neigh))
neigh = __neigh_create(_tbl, nexthop, dst->dev, false);
if (!IS_ERR(neigh)) {
+   sock_confirm_neigh(skb, neigh);
ret = dst_neigh_output(dst, neigh, skb);
rcu_read_unlock_bh();
return ret;
@@ -573,8 +574,10 @@ static int vrf_finish_output(struct net *net, struct sock 
*sk, struct sk_buff *s
neigh = __ipv4_neigh_lookup_noref(dev, nexthop);
if (unlikely(!neigh))
neigh = __neigh_create(_tbl, , dev, false);
-   if (!IS_ERR(neigh))
+   if (!IS_ERR(neigh)) {
+   sock_confirm_neigh(skb, neigh);
ret = dst_neigh_output(dst, neigh, skb);
+   }
 
rcu_read_unlock_bh();
 err:
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index ac7fa34..94d7c36 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -610,6 +610,7 @@ static inline bool skb_mstamp_after(const struct skb_mstamp 
*t1,
  * @wifi_acked_valid: wifi_acked was set
  * @wifi_acked: whether frame was acked on wifi or not
  * @no_fcs:  Request NIC to treat last 4 bytes as Ethernet FCS
+ * @dst_pending_confirm: need to confirm neighbour
   *@napi_id: id of the NAPI struct this skb came from
  * @secmark: security marking
  * @mark: Generic packet mark
@@ -740,6 +741,7 @@ struct sk_buff {
__u8csum_level:2;
__u8csum_bad:1;
 
+   __u8dst_pending_confirm:1;
 #ifdef CONFIG_IPV6_NDISC_NODETYPE
__u8ndisc_nodetype:2;
 #endif
@@ -749,7 +751,7 @@ struct sk_buff {
 #ifdef CONFIG_NET_SWITCHDEV
__u8offload_fwd_mark:1;
 #endif
-   /* 2, 4 or 5 bit hole */
+   /* 1, 3 or 4 bit hole */
 
 #ifdef CONFIG_NET_SCHED
__u16   tc_index;   /* traffic control index */
diff --git a/include/net/sock.h b/include/net/sock.h
index e83bb01..bd63d4d 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1818,6 +1818,20 @@ static inline void sk_dst_confirm(struct sock *sk)
sk->sk_dst_pending_confirm = 1;
 }
 
+static inline void sock_confirm_neigh(struct sk_buff *skb, struct neighbour *n)
+{
+   if (unlikely(skb->dst_pending_confirm)) {
+   struct sock *sk = skb->sk;
+   unsigned long now = jiffies;
+
+   /* avoid dirtying neighbour */
+   if (n->confirmed != now)
+   n->confirmed = now;
+   if (sk && sk->sk_dst_pending_confirm)
+   sk->sk_dst_pending_confirm = 0;
+   }
+}
+
 bool sk_mc_loop(struct sock *sk);
 
 static inline bool sk_can_gso(const struct sock *sk)
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 6c9615c..fbe63cc 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -222,7 +222,10 @@ static int ip_finish_output2(struct net *net, struct sock 
*sk, struct sk_buff *s
if (unlikely(!neigh))
neigh = __neigh_create(_tbl, , dev, false);
if (!IS_ERR(neigh)) {
-   int res = dst_neigh_output(dst, neigh, skb);
+   int res;
+
+   sock_confirm_neigh(skb, neigh);
+   res = dst_neigh_output(dst, neigh, skb);
 
rcu_read_unlock_bh();
return res;
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 70d0de40..285aa9f 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -119,6 +119,7 @@ static int ip6_finish_output2(struct net *net, struct sock 
*sk, struct sk_buff *
if (unlikely(!neigh))
neigh = __neigh_create(_tbl, nexthop, dst->dev, false);
if (!IS_ERR(neigh)) {
+   sock_confirm_neigh(skb, neigh);
ret = dst_neigh_output(dst, neigh, skb);
rcu_read_unlock_bh();
return ret;
-- 
1.9.3



[PATCH] stmmac: fix memory barriers

2016-12-18 Thread Pavel Machek

Fix up memory barriers in stmmac driver. They are meant to protect
against DMA engine, so smp_ variants are certainly wrong, and dma_
variants are preferable.

Signed-off-by: Pavel Machek 

diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac4_descs.c 
b/drivers/net/ethernet/stmicro/stmmac/dwmac4_descs.c
index a340fc8..8816515 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac4_descs.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac4_descs.c
@@ -334,7 +334,7 @@ static void dwmac4_rd_prepare_tx_desc(struct dma_desc *p, 
int is_fs, int len,
 * descriptors for the same frame has to be set before, to
 * avoid race condition.
 */
-   wmb();
+   dma_wmb();
 
p->des3 = cpu_to_le32(tdes3);
 }
@@ -377,7 +377,7 @@ static void dwmac4_rd_prepare_tso_tx_desc(struct dma_desc 
*p, int is_fs,
 * descriptors for the same frame has to be set before, to
 * avoid race condition.
 */
-   wmb();
+   dma_wmb();
 
p->des3 = cpu_to_le32(tdes3);
 }
diff --git a/drivers/net/ethernet/stmicro/stmmac/enh_desc.c 
b/drivers/net/ethernet/stmicro/stmmac/enh_desc.c
index ce97e52..f0d8632 100644
--- a/drivers/net/ethernet/stmicro/stmmac/enh_desc.c
+++ b/drivers/net/ethernet/stmicro/stmmac/enh_desc.c
@@ -350,7 +350,7 @@ static void enh_desc_prepare_tx_desc(struct dma_desc *p, 
int is_fs, int len,
 * descriptors for the same frame has to be set before, to
 * avoid race condition.
 */
-   wmb();
+   dma_wmb();
 
p->des0 = cpu_to_le32(tdes0);
 }
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c 
b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index 3e40578..bb40382 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -2125,7 +2125,7 @@ static netdev_tx_t stmmac_tso_xmit(struct sk_buff *skb, 
struct net_device *dev)
 * descriptor and then barrier is needed to make sure that
 * all is coherent before granting the DMA engine.
 */
-   smp_wmb();
+   dma_wmb();
 
if (netif_msg_pktdata(priv)) {
pr_info("%s: curr=%d dirty=%d f=%d, e=%d, f_p=%p, nfrags %d\n",
@@ -2338,7 +2338,7 @@ static netdev_tx_t stmmac_xmit(struct sk_buff *skb, 
struct net_device *dev)
 * descriptor and then barrier is needed to make sure that
 * all is coherent before granting the DMA engine.
 */
-   smp_wmb();
+   dma_wmb();
}
 
netdev_sent_queue(dev, skb->len);
@@ -2443,14 +2443,14 @@ static inline void stmmac_rx_refill(struct stmmac_priv 
*priv)
netif_dbg(priv, rx_status, priv->dev,
  "refill entry #%d\n", entry);
}
-   wmb();
+   dma_wmb();
 
if (unlikely(priv->synopsys_id >= DWMAC_CORE_4_00))
priv->hw->desc->init_rx_desc(p, priv->use_riwt, 0, 0);
else
priv->hw->desc->set_rx_owner(p);
 
-   wmb();
+   dma_wmb();
 
entry = STMMAC_GET_ENTRY(entry, DMA_RX_SIZE);
}

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


signature.asc
Description: Digital signature


Re: [PATCHv2 1/5] sh_eth: add generic wake-on-lan support via magic packet

2016-12-18 Thread Sergei Shtylyov

Hello.

On 12/12/2016 07:09 PM, Niklas Söderlund wrote:


Add generic functionality to support Wake-on-Lan using MagicPacket which
are supported by at least a few versions of sh_eth. Only add
functionality for WoL, no specific sh_eth version are marked to support


   Versions.


WoL yet.

WoL is enabled in the suspend callback by setting MagicPacket detection
and disabling all interrupts expect MagicPacket. In the resume path the
driver needs to reset the hardware to rearm the WoL logic, this prevents
the driver from simply restoring the registers and to take advantage of
that sh_eth was not suspended to reduce resume time. To reset the
hardware the driver close and reopens the device just like it would do


   Closes.


in a normal suspend/resume scenario without WoL enabled, but it both
close and open the device in the resume callback since the device needs


   Closes and opens.


to be open for WoL to work.



One quirk needed for WoL is that the module clock needs to be prevented
from being switched off by Runtime PM. To keep the clock alive the


   I tried to find the code in question and failed, getting muddled in the 
RPM maze. Could you point at this code for my education? :-)



suspend callback need to call clk_enable() directly to increase the


   My main concern is why we need to manipulate the clock directly --
can't you call RPM to achieve the same effect?


usage count of the clock. Then when Runtime PM decreases the clock usage
count it won't reach 0 and be switched off.


   You mean it does this even though we don't call pr_runtime_put_sync()
as done in sh_eth_close()?


Signed-off-by: Niklas Söderlund 

[...]

MBR, Sergei



Re: [PATCH 1/2] net: ethernet: sxgbe: remove private tx queue lock

2016-12-18 Thread Pavel Machek
Hi!

> > For the same reason it's broken if it races with the transmit path: it
> > can release driver resources while the transmit path uses these.
> > 
> > Btw the points below may not matter/hurt much for a proof a concept
> > but they would need to be addressed as well:
> > 1) unchecked (and avoidable) extra error paths due to stmmac_release()
> > 2) racy cancel_work_sync. Low probability as it is, an irq + error could
> >take place right after cancel_work_sync
> 
> It was indeed only meant as a proof of concept. Nevertheless the race is not 
> good, since one can run into it when faking the tx error for testings 
> purposes.
> So below is a slightly improved version of the restart handling.
> Its not meant as a final version either. But maybe we can use it as a starting
> point.

Certainly works better than version we currently have in tree. I'm
running it in a loop, and it survived 10 minutes of testing so
far. (Previous version killed the hardware at first iteration.)

> Again the patch is only compile tested.

Tested-by: Pavel Machek 

Thanks!
Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


signature.asc
Description: Digital signature


Re: regression: ath_tx_edma_tasklet() Illegal idle entry in RCU read-side critical section

2016-12-18 Thread Paul E. McKenney
On Sun, Dec 18, 2016 at 07:57:42PM +, Valo, Kalle wrote:
> Tobias Klausmann  writes:
> 
> > A patch for this is already floating on the ML for a while now latest:
> > (ath9k: do not return early to fix rcu unlocking)
> 
> It's here:
> 
> https://patchwork.kernel.org/patch/9472709/

Feel free to add:

Acked-by: Paul E. McKenney 

Thanx, Paul

> > Hopefully Kalle will include it in one of his upcoming pull requests.
> 
> Yes, I'll try to get it to 4.10-rc2.
> 
> -- 
> Kalle Valo



Re: wl1251 & mac address & calibration data

2016-12-18 Thread Arend Van Spriel
On 18-12-2016 13:09, Pali Rohár wrote:
> On Sunday 18 December 2016 12:54:00 Arend Van Spriel wrote:
>> On 18-12-2016 12:04, Pali Rohár wrote:
>>> On Sunday 18 December 2016 11:49:53 Arend Van Spriel wrote:
 On 16-12-2016 11:40, Pali Rohár wrote:
> On Friday 16 December 2016 08:25:44 Daniel Wagner wrote:
>> On 12/16/2016 03:03 AM, Luis R. Rodriguez wrote:
>>> For the new API a solution for "fallback mechanisms" should be
>>> clean though and I am looking to stay as far as possible from
>>> the existing mess. A solution to help both the old API and new
>>> API is possible for the "fallback mechanism" though -- but for
>>> that I can only refer you at this point to some of Daniel
>>> Wagner and Tom Gunderson's firmwared deamon prospect. It
>>> should help pave the way for a clean solution and help address
>>> other stupid issues.
>>
>> The firmwared project is hosted here
>>
>> https://github.com/teg/firmwared
>>
>> As Luis pointed out, firmwared relies on
>> FW_LOADER_USER_HELPER_FALLBACK, which is not enabled by default.
>
> I know. But it does not mean that I cannot enable this option at
> kernel compile time.
>
> Bigger problem is that currently request_firmware() first try to
> load firmware directly from VFS and after that (if fails)
> fallback to user helper.
>
> So I would need to extend kernel firmware code with new function
> (or flag) to not use VFS and try only user mode helper.

 Why do you need the user-mode helper anyway. This is all static
 data, right?
>>>
>>> Those are static data, but device specific!
>>
>> So what?
>>
 So why not cook up a firmware file in user-space once and put
 it in /lib/firmware for the driver to request directly.
>>>
>>> 1. Violates FHS
>>
>> How?
>>
>>> 2. Does not work for readonly /, readonly /lib, readonly
>>> /lib/firmware
>>
>> Que?
>>
>>> 3. Backup & restore of rootfs between same devices does not work
>>> (as rootfs now contains device specific data).
>>
>> True.
>>
>>> 4. Sharing one rootfs (either via nfs or other technology) does not
>>> work for more devices (even in state when rootfs is used only by
>>> one device at one time).
>>
>> Indeed.
>>
>>> And it is common that N900 developers have rootfs in laptop and via
>>> usb (cdc_ether) exports it over nfs to N900 device and boot
>>> system. It basically break booting from one nfs-exported rootfs,
>>> as that export become model specific...
>>
>> These are all you choices and more a logistic issue. If your take is
>> that udev is the way to solve those, fine by me.
>>
 Seems a bit
 overkill to have a {e,}udev or whatever daemon running if the
 result is always the same. Just my 2 cents.
>>>
>>> No it is not. It will break couple of other things in Linux and
>>> device
>>
>> Now I am curious. What "couple of other things" will be broken.
>>
>>> and model specific calibration data should not be in /lib/firmware!
>>> That directory is used for firmware files, not calibration.
>>
>> What is "firmware"? Really. These are binary blobs required to make
>> the device work. And guess what, your device needs calibration data.
>> Why make the distinction.
>>
>> Regards,
>> Arend
> 
> File wl1251-nvs.bin is provided by linux-firmware package and contains 
> default data which should be overriden by model specific calibrated 
> data.

Ah. Someone thought it was a good idea to provide the "one ring to rule
them all". Nice.

> But overwriting that one file is not possible as it next update of 
> linux-firmware package will overwrite it back. It break any normal usage 
> of package management.
> 
> Also it is ridiculously broken by design if some "boot" files needs to 
> be overwritten to initialize hardware properly. To not break booting you 
> need to overwrite that file before first boot. But without booting 
> device you cannot read calibration data. So some hack with autoreboot 
> after boot is needed. And how to detect that we have real overwritten 
> calibration data and not default one from linux-firmware? Any heuristic 
> or checks will be broken here. And no, nothing like you need to reboot 
> your device now (and similar concept) from windows world is not 
> accepted.

Well. After reading and creating calibration data you could just rebind
the driver to the device to have it probed again. But yeah, the default
one from linux-firmware should never have been there in the first place.

> "firmware" is one for chip. Any N900 device with wl1251 chip needs 
> exactly same firmware "wl1251-fw.bin". But every N900 needs different 
> calibration data which is not firmware.

Ok. This is exactly why Luis is giving the new API different name just
calling it "data".

Regards,
Arend


Re: regression: ath_tx_edma_tasklet() Illegal idle entry in RCU read-side critical section

2016-12-18 Thread Tobias Klausmann


On 18.12.2016 20:57, Valo, Kalle wrote:

Tobias Klausmann  writes:


A patch for this is already floating on the ML for a while now latest:
(ath9k: do not return early to fix rcu unlocking)

It's here:

https://patchwork.kernel.org/patch/9472709/


Good to know!



Hopefully Kalle will include it in one of his upcoming pull requests.

Yes, I'll try to get it to 4.10-rc2.


Thanks for the update!


Re: regression: ath_tx_edma_tasklet() Illegal idle entry in RCU read-side critical section

2016-12-18 Thread Valo, Kalle
Tobias Klausmann  writes:

> A patch for this is already floating on the ML for a while now latest:
> (ath9k: do not return early to fix rcu unlocking)

It's here:

https://patchwork.kernel.org/patch/9472709/

> Hopefully Kalle will include it in one of his upcoming pull requests.

Yes, I'll try to get it to 4.10-rc2.

-- 
Kalle Valo

[PATCH net v2] ipvlan: fix crash when master is set in loopback mode

2016-12-18 Thread Mahesh Bandewar
From: Mahesh Bandewar 

In an IPvlan setup when master is set in loopback mode e.g.

  ethtool -K eth0 set loopback on

  where eth0 is master device for IPvlan setup.

The failure actually happens while processing mulitcast packets
but that's a result of unconditionally queueing packets without
ensuring ether-header is part of the linear part of skb.

This patch forces this check at the reception and drops packets
which fail this check before queuing them.

[ cut here ]
kernel BUG at include/linux/skbuff.h:1737!
Call Trace:
 [] dev_forward_skb+0x92/0xd0
 [] ipvlan_process_multicast+0x395/0x4c0 [ipvlan]
 [] ? ipvlan_process_multicast+0xd7/0x4c0 [ipvlan]
 [] ? process_one_work+0x147/0x660
 [] process_one_work+0x1a9/0x660
 [] ? process_one_work+0x147/0x660
 [] worker_thread+0x11d/0x360
 [] ? rescuer_thread+0x350/0x350
 [] kthread+0xdb/0xe0
 [] ? _raw_spin_unlock_irq+0x30/0x50
 [] ? flush_kthread_worker+0xc0/0xc0
 [] ret_from_fork+0x9a/0xd0
 [] ? flush_kthread_worker+0xc0/0xc0

Signed-off-by: Mahesh Bandewar 
---
v1->v2: commit log update

 drivers/net/ipvlan/ipvlan_core.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/net/ipvlan/ipvlan_core.c b/drivers/net/ipvlan/ipvlan_core.c
index b4e990743e1d..4294fc1f5564 100644
--- a/drivers/net/ipvlan/ipvlan_core.c
+++ b/drivers/net/ipvlan/ipvlan_core.c
@@ -660,6 +660,9 @@ rx_handler_result_t ipvlan_handle_frame(struct sk_buff 
**pskb)
if (!port)
return RX_HANDLER_PASS;
 
+   if (unlikely(!pskb_may_pull(skb, sizeof(struct ethhdr
+   goto out;
+
switch (port->mode) {
case IPVLAN_MODE_L2:
return ipvlan_handle_mode_l2(pskb, port);
@@ -672,6 +675,8 @@ rx_handler_result_t ipvlan_handle_frame(struct sk_buff 
**pskb)
/* Should not reach here */
WARN_ONCE(true, "ipvlan_handle_frame() called for mode = [%hx]\n",
  port->mode);
+
+out:
kfree_skb(skb);
return RX_HANDLER_CONSUMED;
 }
-- 
2.8.0.rc3.226.g39d4020



Re: [PATCH 1/2] net: ethernet: sxgbe: remove private tx queue lock

2016-12-18 Thread Pavel Machek
Hi!

> > - e1efa87241272104d6a12c8b9fcdc4f62634d447
> 
> Yep, a sync of the dma descriptors before the hardware gets ownership of the 
> tx tail
> idx is missing in the stmmac, too.

I can reproduce failure with 4.4 fairly easily. I tried with dma_
variant of barriers, and it failed, too

[ 1018.410012] stmmac: early irq
[ 1023.939702] fpga_cmd_read:wait_event timed out!
[ 1033.128692] [ cut here ]
[ 1033.133329] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:303
dev_watchdog+0x264/0x284()
[ 1033.141748] NETDEV WATCHDOG: eth0 (socfpga-dwmac): transmit queue 0
timed out
[ 1033.148861] Modules linked in:

Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


signature.asc
Description: Digital signature


Re: mlx4: Bug in XDP_TX + 16 rx-queues

2016-12-18 Thread Martin KaFai Lau
On Sun, Dec 18, 2016 at 12:31:30PM +0200, Tariq Toukan wrote:
> Hi Martin,
>
>
> On 17/12/2016 12:18 PM, Martin KaFai Lau wrote:
> >Hi All,
> >
> >I have been debugging with XDP_TX and 16 rx-queues.
> >
> >1) When 16 rx-queues is used and an XDP prog is doing XDP_TX,
> >it seems that the packet cannot be XDP_TX out if the pkt
> >is received from some particular CPUs (/rx-queues).
> Does the rx_xdp_tx_full counter increase?
The rx_xdp_tx_full counter did not increase.  A capture of
ethtool -S eth0:

[root@kerneltest003.14.prn2 ~]# ethtool -S eth0 | egrep 'rx.*_xdp_tx.*:'
rx_xdp_tx: 1024
rx_xdp_tx_full: 0
rx0_xdp_tx: 64
rx0_xdp_tx_full: 0
rx1_xdp_tx: 64
rx1_xdp_tx_full: 0
rx2_xdp_tx: 64
rx2_xdp_tx_full: 0
rx3_xdp_tx: 64
rx3_xdp_tx_full: 0
rx4_xdp_tx: 64
rx4_xdp_tx_full: 0
rx5_xdp_tx: 64
rx5_xdp_tx_full: 0
rx6_xdp_tx: 64
rx6_xdp_tx_full: 0
rx7_xdp_tx: 64
rx7_xdp_tx_full: 0
rx8_xdp_tx: 64
rx8_xdp_tx_full: 0
rx9_xdp_tx: 63
rx9_xdp_tx_full: 0
rx10_xdp_tx: 65
rx10_xdp_tx_full: 0
rx11_xdp_tx: 64
rx11_xdp_tx_full: 0
rx12_xdp_tx: 64
rx12_xdp_tx_full: 0
rx13_xdp_tx: 64
rx13_xdp_tx_full: 0
rx14_xdp_tx: 64
rx14_xdp_tx_full: 0
rx15_xdp_tx: 64
rx15_xdp_tx_full: 0

> Does the problem repro if you turn off PFC?
> ethtool -A  rx off tx off
Turning pause off does not help.

> >
> >2) If 8 rx-queues is used, it does not have problem.
> >
> >3) The 16 rx-queues problem also went away after reverting these
> >two patches:
> >15fca2c8eb41 net/mlx4_en: Add ethtool statistics for XDP cases
> >67f8b1dcb9ee net/mlx4_en: Refactor the XDP forwarding rings scheme
> >
> >4) I can reproduce the problem by running samples/bof/xdp_ip_tunnel at
> >the receiver side.  The sender side sends out TCP packets with
> >source port ranging from 1 to 1024.  At the sender side also, do
> >a tcpdump to capture the ip-tunnel packet reflected by xdp_ip_tunnel.
> >With 8 rx-queues,  I can get all 1024 packets back.  With 16 rx-queues,
> >I can only get 512 packets back.  It is a 40 CPUs machine.
> >I also checked the rx*_xdp_tx counters (from ethtool -S eth0) to ensure
> >the xdp prog has XDP_TX-ed it out.
> So all packets were transmitted (according to rx*_xdp_tx), and only half the
> of them received on the other side?
Correct.  The XDP program 'samples/bpf/xdp_tx_iptunnel' received,
processed and sent out 1024 packets.  The rx*_xdp_tx also showed all of the
1024 packets.  However, only half of them reached to the other side (by
observing the tcpdump) when 16 rx-queues was used.

Thanks,
--Martin


Re: [PATCH net] ipvlan: fix crash

2016-12-18 Thread महेश बंडेवार
On Sat, Dec 17, 2016 at 8:54 PM, David Miller  wrote:
> From: Mahesh Bandewar 
> Date: Sat, 17 Dec 2016 18:16:19 -0800
>
>> diff --git a/drivers/net/ipvlan/ipvlan_core.c 
>> b/drivers/net/ipvlan/ipvlan_core.c
>> index b4e990743e1d..4294fc1f5564 100644
>> --- a/drivers/net/ipvlan/ipvlan_core.c
>> +++ b/drivers/net/ipvlan/ipvlan_core.c
>> @@ -660,6 +660,9 @@ rx_handler_result_t ipvlan_handle_frame(struct sk_buff 
>> **pskb)
>>   if (!port)
>>   return RX_HANDLER_PASS;
>>
>> + if (unlikely(!pskb_may_pull(skb, sizeof(struct ethhdr
>> + goto out;
>> +
>>   switch (port->mode) {
>
> ipvlan only allows non-loopback ethernet devices to register
> this RX handler.
>
> Such situations being tested here should therefore be completely
> impossible.
>
Yes, correct. This happens when the master device is set in loopback mode.

> Every such device must send the SKB through eth_type_trans(), which
> unconditionally accesses the ethernet header, therefore it must
> be pulled into the linear SKB area already, long before this RX
> handler is invoked.
>
> If this really can legitimately happen, you must explain how so.
>
OK, will update the commit log.

> Just showing the crash that later happens in some (completely
> unrelated BTW) ipvlan multicast workqueue handling function, is
> really an insufficient commit log message for a bug like this.


[PATCH iproute2 1/1] tc: updated man page to reflect filter-id use in filter GET command.

2016-12-18 Thread Roman Mashak
Signed-off-by: Roman Mashak 
---
 man/man8/tc.8 | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/man/man8/tc.8 b/man/man8/tc.8
index 583b72f..f96911a 100644
--- a/man/man8/tc.8
+++ b/man/man8/tc.8
@@ -32,7 +32,8 @@ tc \- show / manipulate traffic control settings
 \fIDEV\fR
 .B [ parent
 \fIqdisc-id\fR
-.B | root ] protocol
+.B | root ] [ handle \fIfilter-id\fR ]
+.B protocol
 \fIprotocol\fR
 .B prio
 \fIpriority\fR filtertype
@@ -577,7 +578,8 @@ it is created.
 
 .TP
 get
-Displays a single filter given the interface, parent ID, priority, protocol 
and handle ID.
+Displays a single filter given the interface \fIDEV\fR, \fIqdisc-id\fR,
+\fIpriority\fR, \fIprotocol\fR and \fIfilter-id\fR.
 
 .TP
 show
-- 
1.9.1



[PATCH iproute2 1/1] tc: fixed man page fonts for keywords and variable values

2016-12-18 Thread Roman Mashak
Signed-off-by: Roman Mashak 
---
 man/man8/tc.8 | 36 ++--
 1 file changed, 18 insertions(+), 18 deletions(-)

diff --git a/man/man8/tc.8 b/man/man8/tc.8
index 8a47a2b..583b72f 100644
--- a/man/man8/tc.8
+++ b/man/man8/tc.8
@@ -5,58 +5,58 @@ tc \- show / manipulate traffic control settings
 .B tc
 .RI "[ " OPTIONS " ]"
 .B qdisc [ add | change | replace | link | delete ] dev
-DEV
+\fIDEV\fR
 .B
 [ parent
-qdisc-id
+\fIqdisc-id\fR
 .B | root ]
 .B [ handle
-qdisc-id ] qdisc
+\fIqdisc-id\fR ] qdisc
 [ qdisc specific parameters ]
 .P
 
 .B tc
 .RI "[ " OPTIONS " ]"
 .B class [ add | change | replace | delete ] dev
-DEV
+\fIDEV\fR
 .B parent
-qdisc-id
+\fIqdisc-id\fR
 .B [ classid
-class-id ] qdisc
+\fIclass-id\fR ] qdisc
 [ qdisc specific parameters ]
 .P
 
 .B tc
 .RI "[ " OPTIONS " ]"
 .B filter [ add | change | replace | delete | get ] dev
-DEV
+\fIDEV\fR
 .B [ parent
-qdisc-id
+\fIqdisc-id\fR
 .B | root ] protocol
-protocol
+\fIprotocol\fR
 .B prio
-priority filtertype
+\fIpriority\fR filtertype
 [ filtertype specific parameters ]
 .B flowid
-flow-id
+\fIflow-id\fR
 
 .B tc
 .RI "[ " OPTIONS " ]"
 .RI "[ " FORMAT " ]"
 .B qdisc show [ dev
-DEV
+\fIDEV\fR
 .B ]
 .P
 .B tc
 .RI "[ " OPTIONS " ]"
 .RI "[ " FORMAT " ]"
 .B class show dev
-DEV
+\fIDEV\fR
 .P
 .B tc
 .RI "[ " OPTIONS " ]"
 .B filter show dev
-DEV
+\fIDEV\fR
 
 .P
 .ti 8
@@ -294,14 +294,14 @@ In the absence of classful qdiscs, classless qdiscs can 
only be attached at
 the root of a device. Full syntax:
 .P
 .B tc qdisc add dev
-DEV
+\fIDEV\fR
 .B root
 QDISC QDISC-PARAMETERS
 
 To remove, issue
 .P
 .B tc qdisc del dev
-DEV
+\fIDEV\fR
 .B root
 
 The
@@ -386,7 +386,7 @@ Type of Service
 Some qdiscs have built in rules for classifying packets based on the TOS field.
 .TP
 skb->priority
-Userspace programs can encode a class-id in the 'skb->priority' field using
+Userspace programs can encode a \fIclass-id\fR in the 'skb->priority' field 
using
 the SO_PRIORITY option.
 .P
 Each node within the tree can have its own filters but higher level filters
@@ -554,7 +554,7 @@ must be passed, either by passing its ID or by attaching 
directly to the root of
 When creating a qdisc or a filter, it can be named with the
 .B handle
 parameter. A class is named with the
-.B classid
+.B \fBclassid\fR
 parameter.
 
 .TP
-- 
1.9.1



Re: [PATCH 1/2] net: ethernet: sxgbe: remove private tx queue lock

2016-12-18 Thread Pavel Machek
Hi!

> > - e1efa87241272104d6a12c8b9fcdc4f62634d447
> 
> Yep, a sync of the dma descriptors before the hardware gets ownership of the 
> tx tail
> idx is missing in the stmmac, too. 

Thanks for the hint. Actually, the driver uses smp_wmb() which is
completely crazy, and probably misses rmb() in clean_tx, too. Anyway,
I did not notice there are dma_ variants, too... we clearly need them.

Thanks and best regards,
Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


signature.asc
Description: Digital signature


Re: regression: ath_tx_edma_tasklet() Illegal idle entry in RCU read-side critical section

2016-12-18 Thread Tobias Klausmann

Hi,

A patch for this is already floating on the ML for a while now latest: 
(ath9k: do not return early to fix rcu unlocking)


Hopefully Kalle will include it in one of his upcoming pull requests.

Greetings,

Tobias


On 18.12.2016 16:59, Paul E. McKenney wrote:

On Sun, Dec 18, 2016 at 02:52:48PM +0100, Gabriel C wrote:

Hello,

while testing kernel 4.9 I run into a weird issue with the ath9k driver.

I can boot the box in console mode and it stay up sometime but is not usable.

Looks to me like someone forgot an rcu_read_unlock() somewhere.  Given that
the unmatched rcu_read_lock() appears in ath_tx_edma_tasklet(), perhaps
that is also where the missing rcu_read_unlock() is.  And sure enough,
in the middle of this function we have the following:

fifo_list = >txq_fifo[txq->txq_tailidx];
if (list_empty(fifo_list)) {
ath_txq_unlock(sc, txq);
return;
}

This will of course return while still in an RCU read-side critical
section.  The caller cannot tell the difference between a return here
and falling off the end of the function, so this is likely the bug.
Or one of the bugs, anyway.  Copying the author and committer for
their thoughts.

Please try the patch at the end of this email.

Thanx, Paul


from dmesg :

===
[ INFO: suspicious RCU usage. ]
4.9-fw1 #1 Tainted: G  I
---
kernel/rcu/tree.c:705 Illegal idle entry in RCU read-side critical section.!

other info that might help us debug this:


RCU used illegally from idle CPU!
rcu_scheduler_active = 1, debug_locks = 1
RCU used illegally from extended quiescent state!
1 lock held by swapper/0/0:
  #0:  (rcu_read_lock){..}, at: [] 
ath_tx_edma_tasklet+0x0/0x460 [ath9k]

stack backtrace:
CPU: 0 PID: 0 Comm: swapper/0 Tainted: G  I 4.9-fw1 #1
Hardware name: FUJITSU  PRIMERGY TX200 S5 
/D2709, BIOS 6.00 Rev. 1.14.2709  02/04/2013
  88043ee03f38 812cf0f3 81a11540 0001
  88043ee03f68 810b7865 81a55d58 88043efcedc0
  88083cb1ca00 00d1 88043ee03f88 810dbfe8
Call Trace:
  
  [] dump_stack+0x86/0xc3
  [] lockdep_rcu_suspicious+0xc5/0x100
  [] rcu_eqs_enter_common.constprop.62+0x128/0x130
  [] rcu_irq_exit+0x38/0x70
  [] irq_exit+0x74/0xd0
  [] do_IRQ+0x71/0x130
  [] common_interrupt+0x8c/0x8c
  
  [] ? cpuidle_enter_state+0x156/0x220
  [] cpuidle_enter+0x12/0x20
  [] call_cpuidle+0x1e/0x40
  [] cpu_startup_entry+0x11d/0x210
  [] rest_init+0x12c/0x140
  [] start_kernel+0x40f/0x41c
  [] ? early_idt_handler_array+0x120/0x120
  [] x86_64_start_reservations+0x2a/0x2c
  [] x86_64_start_kernel+0xeb/0xf8



commit 5a16fed76936184a7ac22e466cf39bd8bb5ee65e
Author: Paul E. McKenney 
Date:   Sun Dec 18 07:49:00 2016 -0800

 drivers/ath: Add missing rcu_read_unlock() to ath_tx_edma_tasklet()
 
 Commit d94a461d7a7d ("ath9k: use ieee80211_tx_status_noskb where possible")

 added rcu_read_lock() and rcu_read_unlock() around the body of
 ath_tx_edma_tasklet(), but failed to add the needed rcu_read_unlock()
 before a "return" in the middle of this function.  This commit therefore
 adds the missing rcu_read_unlock().
 
 Reported-by: Gabriel C 

 Signed-off-by: Paul E. McKenney 
 Cc: Felix Fietkau 
 Cc: Kalle Valo 
 Cc: QCA ath9k Development 
 Cc: 

diff --git a/drivers/net/wireless/ath/ath9k/xmit.c 
b/drivers/net/wireless/ath/ath9k/xmit.c
index 52bfbb988611..857d5ae09a1d 100644
--- a/drivers/net/wireless/ath/ath9k/xmit.c
+++ b/drivers/net/wireless/ath/ath9k/xmit.c
@@ -2787,6 +2787,7 @@ void ath_tx_edma_tasklet(struct ath_softc *sc)
fifo_list = >txq_fifo[txq->txq_tailidx];
if (list_empty(fifo_list)) {
ath_txq_unlock(sc, txq);
+   rcu_read_unlock();
return;
}
  





Re: [PATCH 1/2] net: ethernet: sxgbe: remove private tx queue lock

2016-12-18 Thread Lino Sanfilippo
Hi,

On 18.12.2016 01:15, Francois Romieu wrote:
> Pavel Machek  :
> [...]
>> Won't this up/down the interface, in a way userspace can observe?
> 
> It won't up/down the interface as it doesn't exactly mimic what the
> network code does (there's more than rtnl_lock).
> 

Right. Userspace wont see link down/up, but it will see carrier off/on.
But this is AFAIK acceptable for a rare situation like a tx error.

> For the same reason it's broken if it races with the transmit path: it
> can release driver resources while the transmit path uses these.
> 
> Btw the points below may not matter/hurt much for a proof a concept
> but they would need to be addressed as well:
> 1) unchecked (and avoidable) extra error paths due to stmmac_release()
> 2) racy cancel_work_sync. Low probability as it is, an irq + error could
>take place right after cancel_work_sync

It was indeed only meant as a proof of concept. Nevertheless the race is not 
good, since one can run into it when faking the tx error for testings purposes.
So below is a slightly improved version of the restart handling.
Its not meant as a final version either. But maybe we can use it as a starting
point.

> Lino, have you considered via-rhine.c since its "move work from irq to
> workqueue context" changes that started in
> 7ab87ff4c770eed71e3777936299292739fcd0fe [*] ?
> It's a shameless plug - originated in r8169.c - but it should be rather
> close to what the sxgbe and friends require. Thought / opinion ?
> 

Not really. There are a few drivers that I use to look into if I want to know
how certain things are done correctly (e.g the sky2 or the intel drivers) 
because
I think they are well implemented.
But you seem to have put some thoughts into various race condition problems
in the via-rhine driver and I can image that sxgbe and stmmac still have some
of these issues, too.

> [*] Including fixes/changes in:
> - 3a5a883a8a663b930908cae4abe5ec913b9b2fd2

Ok, the issues you fixed here are concerning the tx_curr and tx_dirty
pointers. For now this is not needed in stmmac and sxgbe since the
tx completion handlers in both drivers are not lock-free like in 
the via-rhine.c but are synchronized with xmit by means of the xmit_lock.

> - e1efa87241272104d6a12c8b9fcdc4f62634d447

Yep, a sync of the dma descriptors before the hardware gets ownership of the tx 
tail
idx is missing in the stmmac, too. 

> - 810f19bcb862f8889b27e0c9d9eceac9593925dd
> - e45af497950a89459a0c4b13ffd91e1729fffef4
> - a926592f5e4e900f3fa903298c4619a131e60963

I think we should use netif_tx_disable() instead of netif_stop_queue(), too, in 
case of restart to avoid a possible schedule of the xmit function while we 
restart. 
So this is also part of the new patch.

Again the patch is only compile tested.

Regards,
Lino

---
 drivers/net/ethernet/stmicro/stmmac/stmmac.h  |  1 +
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 95 +++
 2 files changed, 63 insertions(+), 33 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac.h 
b/drivers/net/ethernet/stmicro/stmmac/stmmac.h
index eab04ae..9c240d7 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac.h
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac.h
@@ -131,6 +131,7 @@ struct stmmac_priv {
u32 rx_tail_addr;
u32 tx_tail_addr;
u32 mss;
+   struct work_struct tx_err_work;
 
 #ifdef CONFIG_DEBUG_FS
struct dentry *dbgfs_dir;
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c 
b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index 3e40578..5762750 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -1403,37 +1403,6 @@ static inline void stmmac_disable_dma_irq(struct 
stmmac_priv *priv)
 }
 
 /**
- * stmmac_tx_err - to manage the tx error
- * @priv: driver private structure
- * Description: it cleans the descriptors and restarts the transmission
- * in case of transmission errors.
- */
-static void stmmac_tx_err(struct stmmac_priv *priv)
-{
-   int i;
-   netif_stop_queue(priv->dev);
-
-   priv->hw->dma->stop_tx(priv->ioaddr);
-   dma_free_tx_skbufs(priv);
-   for (i = 0; i < DMA_TX_SIZE; i++)
-   if (priv->extend_desc)
-   priv->hw->desc->init_tx_desc(>dma_etx[i].basic,
-priv->mode,
-(i == DMA_TX_SIZE - 1));
-   else
-   priv->hw->desc->init_tx_desc(>dma_tx[i],
-priv->mode,
-(i == DMA_TX_SIZE - 1));
-   priv->dirty_tx = 0;
-   priv->cur_tx = 0;
-   netdev_reset_queue(priv->dev);
-   priv->hw->dma->start_tx(priv->ioaddr);
-
-   priv->dev->stats.tx_errors++;
-   netif_wake_queue(priv->dev);
-}
-
-/**
  * stmmac_dma_interrupt - DMA ISR
  * @priv: driver 

Re: [PATCH net-next v4 2/2] net: stmmac: dwmac-meson8b: make the RGMII TX delay configurable

2016-12-18 Thread Martin Blumenstingl
On Sun, Dec 18, 2016 at 4:49 PM, David Miller  wrote:
> From: Martin Blumenstingl 
> Date: Sat, 17 Dec 2016 19:21:19 +0100
>
>> Prior to this patch we were using a hardcoded RGMII TX clock delay of
>> 2ns (= 1/4 cycle of the 125MHz RGMII TX clock). This value works for
>> many boards, but unfortunately not for all (due to the way the actual
>> circuit is designed, sometimes because the TX delay is enabled in the
>> PHY, etc.). Making the TX delay on the MAC side configurable allows us
>> to support all possible hardware combinations.
>>
>> This allows fixing a compatibility issue on some boards, where the
>> RTL8211F PHY is configured to generate the TX delay. We can now turn
>> off the TX delay in the MAC, because otherwise we would be applying the
>> delay twice (which results in non-working TX traffic).
>>
>> Signed-off-by: Martin Blumenstingl 
>> Tested-by: Neil Armstrong 
>
> Is this really the safest thing to do?
>
> If you say the existing hard-coded setting of 1/4 cycle works on most
> boards, and what you're trying to do is override it with an OF
> property value for boards where the existing setting does not work,
> then you _must_ use a default value that corresponds to what the
> existing code does not when you don't see this new OF property.
it's a bit more complicated in reality: 1/4 cycle works when the TX
delay of the RTL8211F PHY is turned off (until recently it was always
enabled for phy-mode RGMII).

> So please retain the current behavior of the 1/4 cycle TX delay
> setting when you don't see the amlogic,tx-delay-ns property.
>
> I really think you risk breaking existing boards by not doing so,
> unless you can have this patch tested on every such board that exists
> and I don't think you really can feasibly and rigorously do that.
there's a patch in my follow-up series which adds the 2ns to the .dts
for all RGMII based boards: [0] (and I would keep these even if we had
a default value, just to make it explicit and thus easier to
understand for other people).
however, we can add the 2ns default back (I can do this if you want -
Rob Herring was unhappy with the missing documentation of this default
value [1] - so note to myself: take care of that as well). but then we
have to decide when to apply this default value: only when we're in
RGMII mode or also in any of the RGMII_*ID modes?

please let me know how we should proceed


Regards,
Martin


[0] http://lists.infradead.org/pipermail/linux-amlogic/2016-December/001838.html
[1] http://lists.infradead.org/pipermail/linux-amlogic/2016-November/001817.html


Re: regression: ath_tx_edma_tasklet() Illegal idle entry in RCU read-side critical section

2016-12-18 Thread Paul E. McKenney
On Sun, Dec 18, 2016 at 02:52:48PM +0100, Gabriel C wrote:
> Hello,
> 
> while testing kernel 4.9 I run into a weird issue with the ath9k driver.
> 
> I can boot the box in console mode and it stay up sometime but is not usable.

Looks to me like someone forgot an rcu_read_unlock() somewhere.  Given that
the unmatched rcu_read_lock() appears in ath_tx_edma_tasklet(), perhaps
that is also where the missing rcu_read_unlock() is.  And sure enough,
in the middle of this function we have the following:

fifo_list = >txq_fifo[txq->txq_tailidx];
if (list_empty(fifo_list)) {
ath_txq_unlock(sc, txq);
return;
}

This will of course return while still in an RCU read-side critical
section.  The caller cannot tell the difference between a return here
and falling off the end of the function, so this is likely the bug.
Or one of the bugs, anyway.  Copying the author and committer for
their thoughts.

Please try the patch at the end of this email.

Thanx, Paul

> from dmesg :
> 
> ===
> [ INFO: suspicious RCU usage. ]
> 4.9-fw1 #1 Tainted: G  I
> ---
> kernel/rcu/tree.c:705 Illegal idle entry in RCU read-side critical section.!
> 
> other info that might help us debug this:
> 
> 
> RCU used illegally from idle CPU!
> rcu_scheduler_active = 1, debug_locks = 1
> RCU used illegally from extended quiescent state!
> 1 lock held by swapper/0/0:
>  #0:  (rcu_read_lock){..}, at: [] 
> ath_tx_edma_tasklet+0x0/0x460 [ath9k]
> 
> stack backtrace:
> CPU: 0 PID: 0 Comm: swapper/0 Tainted: G  I 4.9-fw1 #1
> Hardware name: FUJITSU  PRIMERGY TX200 S5 
> /D2709, BIOS 6.00 Rev. 1.14.2709  02/04/2013
>  88043ee03f38 812cf0f3 81a11540 0001
>  88043ee03f68 810b7865 81a55d58 88043efcedc0
>  88083cb1ca00 00d1 88043ee03f88 810dbfe8
> Call Trace:
>  
>  [] dump_stack+0x86/0xc3
>  [] lockdep_rcu_suspicious+0xc5/0x100
>  [] rcu_eqs_enter_common.constprop.62+0x128/0x130
>  [] rcu_irq_exit+0x38/0x70
>  [] irq_exit+0x74/0xd0
>  [] do_IRQ+0x71/0x130
>  [] common_interrupt+0x8c/0x8c
>  
>  [] ? cpuidle_enter_state+0x156/0x220
>  [] cpuidle_enter+0x12/0x20
>  [] call_cpuidle+0x1e/0x40
>  [] cpu_startup_entry+0x11d/0x210
>  [] rest_init+0x12c/0x140
>  [] start_kernel+0x40f/0x41c
>  [] ? early_idt_handler_array+0x120/0x120
>  [] x86_64_start_reservations+0x2a/0x2c
>  [] x86_64_start_kernel+0xeb/0xf8



commit 5a16fed76936184a7ac22e466cf39bd8bb5ee65e
Author: Paul E. McKenney 
Date:   Sun Dec 18 07:49:00 2016 -0800

drivers/ath: Add missing rcu_read_unlock() to ath_tx_edma_tasklet()

Commit d94a461d7a7d ("ath9k: use ieee80211_tx_status_noskb where possible")
added rcu_read_lock() and rcu_read_unlock() around the body of
ath_tx_edma_tasklet(), but failed to add the needed rcu_read_unlock()
before a "return" in the middle of this function.  This commit therefore
adds the missing rcu_read_unlock().

Reported-by: Gabriel C 
Signed-off-by: Paul E. McKenney 
Cc: Felix Fietkau 
Cc: Kalle Valo 
Cc: QCA ath9k Development 
Cc: 

diff --git a/drivers/net/wireless/ath/ath9k/xmit.c 
b/drivers/net/wireless/ath/ath9k/xmit.c
index 52bfbb988611..857d5ae09a1d 100644
--- a/drivers/net/wireless/ath/ath9k/xmit.c
+++ b/drivers/net/wireless/ath/ath9k/xmit.c
@@ -2787,6 +2787,7 @@ void ath_tx_edma_tasklet(struct ath_softc *sc)
fifo_list = >txq_fifo[txq->txq_tailidx];
if (list_empty(fifo_list)) {
ath_txq_unlock(sc, txq);
+   rcu_read_unlock();
return;
}
 



Re: [PATCH net-next v4 2/2] net: stmmac: dwmac-meson8b: make the RGMII TX delay configurable

2016-12-18 Thread David Miller
From: Martin Blumenstingl 
Date: Sat, 17 Dec 2016 19:21:19 +0100

> Prior to this patch we were using a hardcoded RGMII TX clock delay of
> 2ns (= 1/4 cycle of the 125MHz RGMII TX clock). This value works for
> many boards, but unfortunately not for all (due to the way the actual
> circuit is designed, sometimes because the TX delay is enabled in the
> PHY, etc.). Making the TX delay on the MAC side configurable allows us
> to support all possible hardware combinations.
> 
> This allows fixing a compatibility issue on some boards, where the
> RTL8211F PHY is configured to generate the TX delay. We can now turn
> off the TX delay in the MAC, because otherwise we would be applying the
> delay twice (which results in non-working TX traffic).
> 
> Signed-off-by: Martin Blumenstingl 
> Tested-by: Neil Armstrong 

Is this really the safest thing to do?

If you say the existing hard-coded setting of 1/4 cycle works on most
boards, and what you're trying to do is override it with an OF
property value for boards where the existing setting does not work,
then you _must_ use a default value that corresponds to what the
existing code does not when you don't see this new OF property.

So please retain the current behavior of the 1/4 cycle TX delay
setting when you don't see the amlogic,tx-delay-ns property.

I really think you risk breaking existing boards by not doing so,
unless you can have this patch tested on every such board that exists
and I don't think you really can feasibly and rigorously do that.

Thanks.


Re: [PATCH] qed: fix memory leak of a qed_spq_entry on error failure paths

2016-12-18 Thread David Miller
From: "Mintz, Yuval" 
Date: Sun, 18 Dec 2016 06:33:50 +

>> From: Colin Ian King 
>> 
>> A qed_spq_entry entry is allocated by qed_sp_init_request but is not kfree'd
>> if an error occurs, causing a memory leak. Fix this by kfree'ing it and also
>> setting *pp_ent to NULL to be safe.
>> 
>> Found with static analysis by CoverityScan, CIDs 1389468-1389470
>> 
>> Signed-off-by: Colin Ian King 
> ...
>> +err:
>> +kfree(*pp_ent);
>> +*pp_ent = NULL;
>> +
>> +return rc;
>>  }
> 
> Hi Colin - thanks for this.
> It would have been preferable to return the previously allocated spq entry.
> I.e., do:
> 
> +err:
> + qed_spq_return_entry(p_hwfn, *pp_ent);
> + *pp_ent = NULL;
> + return rc;

Looking at this last night, I came to the same conclusion.


regression: ath_tx_edma_tasklet() Illegal idle entry in RCU read-side critical section

2016-12-18 Thread Gabriel C

Hello,

while testing kernel 4.9 I run into a weird issue with the ath9k driver.

I can boot the box in console mode and it stay up sometime but is not usable.


from dmesg :

===
[ INFO: suspicious RCU usage. ]
4.9-fw1 #1 Tainted: G  I
---
kernel/rcu/tree.c:705 Illegal idle entry in RCU read-side critical section.!

other info that might help us debug this:


RCU used illegally from idle CPU!
rcu_scheduler_active = 1, debug_locks = 1
RCU used illegally from extended quiescent state!
1 lock held by swapper/0/0:
 #0:  (rcu_read_lock){..}, at: [] 
ath_tx_edma_tasklet+0x0/0x460 [ath9k]

stack backtrace:
CPU: 0 PID: 0 Comm: swapper/0 Tainted: G  I 4.9-fw1 #1
Hardware name: FUJITSU  PRIMERGY TX200 S5 
/D2709, BIOS 6.00 Rev. 1.14.2709  02/04/2013
 88043ee03f38 812cf0f3 81a11540 0001
 88043ee03f68 810b7865 81a55d58 88043efcedc0
 88083cb1ca00 00d1 88043ee03f88 810dbfe8
Call Trace:
 
 [] dump_stack+0x86/0xc3
 [] lockdep_rcu_suspicious+0xc5/0x100
 [] rcu_eqs_enter_common.constprop.62+0x128/0x130
 [] rcu_irq_exit+0x38/0x70
 [] irq_exit+0x74/0xd0
 [] do_IRQ+0x71/0x130
 [] common_interrupt+0x8c/0x8c
 
 [] ? cpuidle_enter_state+0x156/0x220
 [] cpuidle_enter+0x12/0x20
 [] call_cpuidle+0x1e/0x40
 [] cpu_startup_entry+0x11d/0x210
 [] rest_init+0x12c/0x140
 [] start_kernel+0x40f/0x41c
 [] ? early_idt_handler_array+0x120/0x120
 [] x86_64_start_reservations+0x2a/0x2c
 [] x86_64_start_kernel+0xeb/0xf8

...

perf: interrupt took too long (2766 > 2500), lowering 
kernel.perf_event_max_sample_rate to 72000
perf: interrupt took too long (3510 > 3457), lowering 
kernel.perf_event_max_sample_rate to 56000
perf: interrupt took too long (4689 > 4387), lowering 
kernel.perf_event_max_sample_rate to 42000
perf: interrupt took too long (5980 > 5861), lowering 
kernel.perf_event_max_sample_rate to 33000
INFO: rcu_preempt detected stalls on CPUs/tasks:
Tasks blocked on level-0 rcu_node (CPUs 0-15): P0
(detected by 5, t=65002 jiffies, g=3241, c=3240, q=8520)
swapper/0   R  running task0 0  0 0x
 81a03e90 8139bf30 81ae30b8 810253a9
 88083cb1e600 81ae30a0 0002 81ae30b8
 81ae2fe0 81a03ed0 81472814 001823671b47
Call Trace:
 [] ? acpi_idle_enter+0x116/0x1fb
 [] ? cpuidle_enter_state+0x134/0x220
 [] ? cpuidle_enter+0x12/0x20
 [] ? call_cpuidle+0x1e/0x40
 [] ? cpu_startup_entry+0x11d/0x210
 [] ? rest_init+0x12c/0x140
 [] ? start_kernel+0x40f/0x41c
 [] ? early_idt_handler_array+0x120/0x120
 [] ? x86_64_start_reservations+0x2a/0x2c
 [] ? x86_64_start_kernel+0xeb/0xf8
swapper/0   R  running task0 0  0 0x
 81a03e90 8139bf30 81ae30b8 810253a9
 88083cb1e600 81ae30a0 0002 81ae30b8
 81ae2fe0 81a03ed0 81472814 001823671b47
Call Trace:
 [] ? acpi_idle_enter+0x116/0x1fb
 [] ? cpuidle_enter_state+0x134/0x220
 [] ? cpuidle_enter+0x12/0x20
 [] ? call_cpuidle+0x1e/0x40
 [] ? cpu_startup_entry+0x11d/0x210
 [] ? rest_init+0x12c/0x140
 [] ? start_kernel+0x40f/0x41c
 [] ? early_idt_handler_array+0x120/0x120
 [] ? x86_64_start_reservations+0x2a/0x2c
 [] ? x86_64_start_kernel+0xeb/0xf8
perf: interrupt took too long (7746 > 7475), lowering 
kernel.perf_event_max_sample_rate to 25000
systemd-hostnamed.service: State 'stop-sigterm' timed out. Killing.
systemd-hostnamed.service: Killing process 1507 (systemd-hostnam) with signal 
SIGKILL.
perf: interrupt took too long (10065 > 9682), lowering 
kernel.perf_event_max_sample_rate to 19000
perf: interrupt took too long (12596 > 12581), lowering 
kernel.perf_event_max_sample_rate to 15000
INFO: task systemd-hostnam:1507 blocked for more than 120 seconds.
  Tainted: G  I 4.9-fw1 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
systemd-hostnam D0  1507  1 0x0002
 88043a29f200 c460 88043ab0a1c0 88043cdc
 88043f9d6718 c9000b67fb88 8157ff6e c9000b67fbf8
 88043ab0abc0 88043f9d6718  88043ab0a1c0
Call Trace:
 [] ? __schedule+0x2ce/0x810
 [] schedule+0x3b/0x90
 [] schedule_timeout+0x222/0x3a0
 [] ? debug_smp_processor_id+0x17/0x20
 [] ? debug_smp_processor_id+0x17/0x20
 [] ? get_lock_stats+0x19/0x50
 [] ? _raw_spin_unlock_irq+0x27/0x50
 [] ? __this_cpu_preempt_check+0x13/0x20
 [] ? trace_hardirqs_on_caller+0xef/0x200
 [] wait_for_common+0xca/0x180
 [] ? wake_up_q+0x80/0x80
 [] wait_for_completion+0x18/0x20
 [] __wait_rcu_gp+0xc5/0x100
 [] synchronize_rcu.part.53+0x2d/0x50
 [] ? __call_rcu.constprop.59+0x270/0x270
 [] ? rcu_panic+0x20/0x20
 [] ? wait_for_common+0x39/0x180
 [] synchronize_rcu+0x27/0x90
 [] namespace_unlock+0x47/0x60
 [] 

Re: [PATCH net-next v3 0/4] Fix OdroidC2 Gigabit Tx link issue

2016-12-18 Thread Martin Blumenstingl
Hi Florian, Hi Jerome,

On Wed, Nov 30, 2016 at 2:15 AM, Florian Fainelli  wrote:
> On 11/29/2016 05:13 PM, David Miller wrote:
>> From: Florian Fainelli 
>> Date: Tue, 29 Nov 2016 16:43:20 -0800
>>
>>> On 11/29/2016 04:38 PM, David Miller wrote:
 From: Jerome Brunet 
 Date: Mon, 28 Nov 2016 10:46:45 +0100

> This patchset fixes an issue with the OdroidC2 board (DWMAC + RTL8211F).
> The platform seems to enter LPI on the Rx path too often while performing
> relatively high TX transfer. This eventually break the link (both Tx and
> Rx), and require to bring the interface down and up again to get the Rx
> path working again.
>
> The root cause of this issue is not fully understood yet but disabling EEE
> advertisement on the PHY prevent this feature to be negotiated.
> With this change, the link is stable and reliable, with the expected
> throughput performance.
>
> The patchset adds options in the generic phy driver to disable EEE
> advertisement, through device tree. The way it is done is very similar
> to the handling of the max-speed property.

 Patches 1-3 applied to net-next, thanks.
>>>
>>> Meh, there was a v4 submitted shortly after, and I objected to the whole
>>> idea of using that kind of Device Tree properties to disable EEE, we can
>>> send reverts though..
>>
>> Sorry, I lost this in all the discussion, I can revert.
>
> Yeah, I can understand why, these freaking PHYs tend to generate a lot
> of noise and discussion...
>
>>
>> Just send me a revert of the entire merge commit
>> a152c91889556df17ca6d8ea134fb2cb4ac9f893 with a short
>> description of why and I'll apply it.
>
> OK, I will talk with Jerome first to make sure that we are in agreement
> with the solution to deploy to fix the OdroidC2 problem first.
simply because I'm curious: what was the outcome of your discussion?
can we stay with the current solution or are any changes required?


Regards,
Martin


Re: [PATCH iproute2 2/2] tc/m_tunnel_key: Add dest UDP port to tunnel key action

2016-12-18 Thread Hadar Hen Zion



On 12/15/2016 3:53 PM, Simon Horman wrote:

On Thu, Dec 15, 2016 at 02:03:36PM +0100, Simon Horman wrote:

On Tue, Dec 13, 2016 at 10:07:47AM +0200, Hadar Hen Zion wrote:

Enhance tunnel key action parameters by adding destination UDP port.

Signed-off-by: Hadar Hen Zion 
Reviewed-by: Roi Dayan 

Hi,

this looks good to me but could you also update tc/m_tunnel_key.c:usage(); ?

It seems that I was a bit hasty here as I now see that Stephen has
indicated that he has applied this series. I also notice that
patch 1/2 of this series also misses updating usage(). Let me know
if sending some follow-up patches is the best way forwards.

Yes, I you are right, I'll send a follow-up patches.
Thanks,
Hadar



Re: wl1251 & mac address & calibration data

2016-12-18 Thread Pali Rohár
On Sunday 18 December 2016 12:54:00 Arend Van Spriel wrote:
> On 18-12-2016 12:04, Pali Rohár wrote:
> > On Sunday 18 December 2016 11:49:53 Arend Van Spriel wrote:
> >> On 16-12-2016 11:40, Pali Rohár wrote:
> >>> On Friday 16 December 2016 08:25:44 Daniel Wagner wrote:
>  On 12/16/2016 03:03 AM, Luis R. Rodriguez wrote:
> > For the new API a solution for "fallback mechanisms" should be
> > clean though and I am looking to stay as far as possible from
> > the existing mess. A solution to help both the old API and new
> > API is possible for the "fallback mechanism" though -- but for
> > that I can only refer you at this point to some of Daniel
> > Wagner and Tom Gunderson's firmwared deamon prospect. It
> > should help pave the way for a clean solution and help address
> > other stupid issues.
>  
>  The firmwared project is hosted here
>  
>  https://github.com/teg/firmwared
>  
>  As Luis pointed out, firmwared relies on
>  FW_LOADER_USER_HELPER_FALLBACK, which is not enabled by default.
> >>> 
> >>> I know. But it does not mean that I cannot enable this option at
> >>> kernel compile time.
> >>> 
> >>> Bigger problem is that currently request_firmware() first try to
> >>> load firmware directly from VFS and after that (if fails)
> >>> fallback to user helper.
> >>> 
> >>> So I would need to extend kernel firmware code with new function
> >>> (or flag) to not use VFS and try only user mode helper.
> >> 
> >> Why do you need the user-mode helper anyway. This is all static
> >> data, right?
> > 
> > Those are static data, but device specific!
> 
> So what?
> 
> >> So why not cook up a firmware file in user-space once and put
> >> it in /lib/firmware for the driver to request directly.
> > 
> > 1. Violates FHS
> 
> How?
> 
> > 2. Does not work for readonly /, readonly /lib, readonly
> > /lib/firmware
> 
> Que?
> 
> > 3. Backup & restore of rootfs between same devices does not work
> > (as rootfs now contains device specific data).
> 
> True.
> 
> > 4. Sharing one rootfs (either via nfs or other technology) does not
> > work for more devices (even in state when rootfs is used only by
> > one device at one time).
> 
> Indeed.
> 
> > And it is common that N900 developers have rootfs in laptop and via
> > usb (cdc_ether) exports it over nfs to N900 device and boot
> > system. It basically break booting from one nfs-exported rootfs,
> > as that export become model specific...
> 
> These are all you choices and more a logistic issue. If your take is
> that udev is the way to solve those, fine by me.
> 
> >> Seems a bit
> >> overkill to have a {e,}udev or whatever daemon running if the
> >> result is always the same. Just my 2 cents.
> > 
> > No it is not. It will break couple of other things in Linux and
> > device
> 
> Now I am curious. What "couple of other things" will be broken.
> 
> > and model specific calibration data should not be in /lib/firmware!
> > That directory is used for firmware files, not calibration.
> 
> What is "firmware"? Really. These are binary blobs required to make
> the device work. And guess what, your device needs calibration data.
> Why make the distinction.
> 
> Regards,
> Arend

File wl1251-nvs.bin is provided by linux-firmware package and contains 
default data which should be overriden by model specific calibrated 
data.

But overwriting that one file is not possible as it next update of 
linux-firmware package will overwrite it back. It break any normal usage 
of package management.

Also it is ridiculously broken by design if some "boot" files needs to 
be overwritten to initialize hardware properly. To not break booting you 
need to overwrite that file before first boot. But without booting 
device you cannot read calibration data. So some hack with autoreboot 
after boot is needed. And how to detect that we have real overwritten 
calibration data and not default one from linux-firmware? Any heuristic 
or checks will be broken here. And no, nothing like you need to reboot 
your device now (and similar concept) from windows world is not 
accepted.

"firmware" is one for chip. Any N900 device with wl1251 chip needs 
exactly same firmware "wl1251-fw.bin". But every N900 needs different 
calibration data which is not firmware.

-- 
Pali Rohár
pali.ro...@gmail.com


signature.asc
Description: This is a digitally signed message part.


Re: wl1251 & mac address & calibration data

2016-12-18 Thread Arend Van Spriel
On 18-12-2016 12:04, Pali Rohár wrote:
> On Sunday 18 December 2016 11:49:53 Arend Van Spriel wrote:
>> On 16-12-2016 11:40, Pali Rohár wrote:
>>> On Friday 16 December 2016 08:25:44 Daniel Wagner wrote:
 On 12/16/2016 03:03 AM, Luis R. Rodriguez wrote:
> For the new API a solution for "fallback mechanisms" should be
> clean though and I am looking to stay as far as possible from the
> existing mess. A solution to help both the old API and new API is
> possible for the "fallback mechanism" though -- but for that I
> can only refer you at this point to some of Daniel Wagner and
> Tom Gunderson's firmwared deamon prospect. It should help pave
> the way for a clean solution and help address other stupid
> issues.

 The firmwared project is hosted here

 https://github.com/teg/firmwared

 As Luis pointed out, firmwared relies on
 FW_LOADER_USER_HELPER_FALLBACK, which is not enabled by default.
>>>
>>> I know. But it does not mean that I cannot enable this option at
>>> kernel compile time.
>>>
>>> Bigger problem is that currently request_firmware() first try to
>>> load firmware directly from VFS and after that (if fails) fallback
>>> to user helper.
>>>
>>> So I would need to extend kernel firmware code with new function
>>> (or flag) to not use VFS and try only user mode helper.
>>
>> Why do you need the user-mode helper anyway. This is all static data,
>> right?
> 
> Those are static data, but device specific!

So what?

>> So why not cook up a firmware file in user-space once and put
>> it in /lib/firmware for the driver to request directly.
> 
> 1. Violates FHS

How?

> 2. Does not work for readonly /, readonly /lib, readonly /lib/firmware

Que?

> 3. Backup & restore of rootfs between same devices does not work (as 
> rootfs now contains device specific data).

True.

> 4. Sharing one rootfs (either via nfs or other technology) does not work 
> for more devices (even in state when rootfs is used only by one device 
> at one time).

Indeed.

> And it is common that N900 developers have rootfs in laptop and via usb 
> (cdc_ether) exports it over nfs to N900 device and boot system. It 
> basically break booting from one nfs-exported rootfs, as that export 
> become model specific...

These are all you choices and more a logistic issue. If your take is
that udev is the way to solve those, fine by me.

>> Seems a bit
>> overkill to have a {e,}udev or whatever daemon running if the result
>> is always the same. Just my 2 cents.
> 
> No it is not. It will break couple of other things in Linux and device 

Now I am curious. What "couple of other things" will be broken.

> and model specific calibration data should not be in /lib/firmware! That 
> directory is used for firmware files, not calibration.

What is "firmware"? Really. These are binary blobs required to make the
device work. And guess what, your device needs calibration data. Why
make the distinction.

Regards,
Arend



Re: wl1251 & mac address & calibration data

2016-12-18 Thread Pali Rohár
On Sunday 18 December 2016 11:49:53 Arend Van Spriel wrote:
> On 16-12-2016 11:40, Pali Rohár wrote:
> > On Friday 16 December 2016 08:25:44 Daniel Wagner wrote:
> >> On 12/16/2016 03:03 AM, Luis R. Rodriguez wrote:
> >>> For the new API a solution for "fallback mechanisms" should be
> >>> clean though and I am looking to stay as far as possible from the
> >>> existing mess. A solution to help both the old API and new API is
> >>> possible for the "fallback mechanism" though -- but for that I
> >>> can only refer you at this point to some of Daniel Wagner and
> >>> Tom Gunderson's firmwared deamon prospect. It should help pave
> >>> the way for a clean solution and help address other stupid
> >>> issues.
> >> 
> >> The firmwared project is hosted here
> >> 
> >> https://github.com/teg/firmwared
> >> 
> >> As Luis pointed out, firmwared relies on
> >> FW_LOADER_USER_HELPER_FALLBACK, which is not enabled by default.
> > 
> > I know. But it does not mean that I cannot enable this option at
> > kernel compile time.
> > 
> > Bigger problem is that currently request_firmware() first try to
> > load firmware directly from VFS and after that (if fails) fallback
> > to user helper.
> > 
> > So I would need to extend kernel firmware code with new function
> > (or flag) to not use VFS and try only user mode helper.
> 
> Why do you need the user-mode helper anyway. This is all static data,
> right?

Those are static data, but device specific!

> So why not cook up a firmware file in user-space once and put
> it in /lib/firmware for the driver to request directly.

1. Violates FHS

2. Does not work for readonly /, readonly /lib, readonly /lib/firmware

3. Backup & restore of rootfs between same devices does not work (as 
rootfs now contains device specific data).

4. Sharing one rootfs (either via nfs or other technology) does not work 
for more devices (even in state when rootfs is used only by one device 
at one time).

And it is common that N900 developers have rootfs in laptop and via usb 
(cdc_ether) exports it over nfs to N900 device and boot system. It 
basically break booting from one nfs-exported rootfs, as that export 
become model specific...

> Seems a bit
> overkill to have a {e,}udev or whatever daemon running if the result
> is always the same. Just my 2 cents.

No it is not. It will break couple of other things in Linux and device 
and model specific calibration data should not be in /lib/firmware! That 
directory is used for firmware files, not calibration.

-- 
Pali Rohár
pali.ro...@gmail.com


signature.asc
Description: This is a digitally signed message part.


Re: wl1251 & mac address & calibration data

2016-12-18 Thread Arend Van Spriel
On 16-12-2016 11:40, Pali Rohár wrote:
> On Friday 16 December 2016 08:25:44 Daniel Wagner wrote:
>> On 12/16/2016 03:03 AM, Luis R. Rodriguez wrote:
>>> For the new API a solution for "fallback mechanisms" should be
>>> clean though and I am looking to stay as far as possible from the
>>> existing mess. A solution to help both the old API and new API is
>>> possible for the "fallback mechanism" though -- but for that I can
>>> only refer you at this point to some of Daniel Wagner and Tom
>>> Gunderson's firmwared deamon prospect. It should help pave the way
>>> for a clean solution and help address other stupid issues.
>>
>> The firmwared project is hosted here
>>
>> https://github.com/teg/firmwared
>>
>> As Luis pointed out, firmwared relies on
>> FW_LOADER_USER_HELPER_FALLBACK, which is not enabled by default.
> 
> I know. But it does not mean that I cannot enable this option at kernel 
> compile time.
> 
> Bigger problem is that currently request_firmware() first try to load 
> firmware directly from VFS and after that (if fails) fallback to user 
> helper.
> 
> So I would need to extend kernel firmware code with new function (or 
> flag) to not use VFS and try only user mode helper.

Why do you need the user-mode helper anyway. This is all static data,
right? So why not cook up a firmware file in user-space once and put it
in /lib/firmware for the driver to request directly. Seems a bit
overkill to have a {e,}udev or whatever daemon running if the result is
always the same. Just my 2 cents.

Regards,
Arend


Re: mlx4: Bug in XDP_TX + 16 rx-queues

2016-12-18 Thread Tariq Toukan

Hi Martin,


On 17/12/2016 12:18 PM, Martin KaFai Lau wrote:

Hi All,

I have been debugging with XDP_TX and 16 rx-queues.

1) When 16 rx-queues is used and an XDP prog is doing XDP_TX,
it seems that the packet cannot be XDP_TX out if the pkt
is received from some particular CPUs (/rx-queues).

Does the rx_xdp_tx_full counter increase?
Does the problem repro if you turn off PFC?
ethtool -A  rx off tx off


2) If 8 rx-queues is used, it does not have problem.

3) The 16 rx-queues problem also went away after reverting these
two patches:
15fca2c8eb41 net/mlx4_en: Add ethtool statistics for XDP cases
67f8b1dcb9ee net/mlx4_en: Refactor the XDP forwarding rings scheme

4) I can reproduce the problem by running samples/bof/xdp_ip_tunnel at
the receiver side.  The sender side sends out TCP packets with
source port ranging from 1 to 1024.  At the sender side also, do
a tcpdump to capture the ip-tunnel packet reflected by xdp_ip_tunnel.
With 8 rx-queues,  I can get all 1024 packets back.  With 16 rx-queues,
I can only get 512 packets back.  It is a 40 CPUs machine.
I also checked the rx*_xdp_tx counters (from ethtool -S eth0) to ensure
the xdp prog has XDP_TX-ed it out.
So all packets were transmitted (according to rx*_xdp_tx), and only half 
the of them received on the other side?


Not saying that 67f8b1dcb9ee is 100% the cause because there are other
changes since then.  It is merely a brain dump on what I have already
tried.

Tariq/Saeed, any thoughts?  I can easily test some patches in
my setup.

Thanks,
--Martin

Thanks,
Tariq