Re: [PATCH net-next 0/8] net: Extend availability of PHY statistics

2018-04-26 Thread Nikita Yushchenko
> Hi all,
> 
> This patch series adds support for retrieving PHY statistics with DSA switches
> when the CPU port uses a PHY to PHY connection (as opposed to MAC to MAC).
> To get there a number of things are done:

Tested-By: Nikita Yushchenko <nikita.yo...@cogentembedded.com>

Checked on RDU1 and RDU2.
FEC PHY statistics keeps shown.
No new statistics appears but it is not expected with Marvell switch.

Nikita


Re: Kernel crashes in phy_attach_direct()

2017-02-08 Thread Nikita Yushchenko
> Hi,
>
> I am getting the following kernel crash with linux-next 20170208
> running on a imx53-qsb board.
>
> Any ideas?

Same here, on zii-dev-revB and zii-dev-revC boards.

Crash is in phy_attach_direct() in line

if (!try_module_get(d->driver->owner)) {

caused by d->driver being NULL.

Nikita


Re: at86rf230: Allow slow GPIO pins for "rstn"

2016-12-21 Thread Nikita Yushchenko
> Driver code never touches "rstn" signal in atomic context, so there's
> no need to implicitly put such restriction on it by using gpio_set_value
> to manipulate it. Replace gpio_set_value to gpio_set_value_cansleep to
> fix that.
> 
> As a an example of where such restriction might be inconvenient,
> consider a hardware design where "rstn" is connected to a pin of I2C/SPI
> GPIO expander chip.
> 
> Cc: Chris Healy <cphe...@gmail.com>
> Signed-off-by: Andrey Smirnov <andrew.smir...@gmail.com>

Reviewed-by: Nikita Yushchenko <nikita.yo...@cogentembedded.com>


[PATCH/RFC net-next] net: fec: allow "mini jumbo" frames

2016-12-09 Thread Nikita Yushchenko
This adds support for MTU slightly larger than default, on modern
FEC flavours.

Currently FEC driver uses single hardware Rx buffer per frame. On most
FEC flavours, size of single buffer is limited by 11-bit field, and
has to be multiple of 64 (in the worst case). Thus maximum usable Rx
buffer size is 1984 bytes.

Of those:
- 2 bytes are used for IP header alignment,
- 14 bytes are used by ethhdr,
- up to 8 bytes are needed for VLAN and/or DSA tags,
- 4 bytes are needed for CRC.

Thus maximum MTU possible within current RX architecture is 1956.

This patch allows exactly that. For further increase, Rx architecture
change is needed.

Use of MTU=1956 gives about 1.5% throughput improvement between two Vybrid
boards, compared to default MTU=1500.

Signed-off-by: Nikita Yushchenko <nikita.yo...@cogentembedded.com>
---
 drivers/net/ethernet/freescale/fec.h  |  2 +
 drivers/net/ethernet/freescale/fec_main.c | 69 ---
 2 files changed, 47 insertions(+), 24 deletions(-)

diff --git a/drivers/net/ethernet/freescale/fec.h 
b/drivers/net/ethernet/freescale/fec.h
index 5ea740b4cf14..72c918bd10f3 100644
--- a/drivers/net/ethernet/freescale/fec.h
+++ b/drivers/net/ethernet/freescale/fec.h
@@ -557,6 +557,8 @@ struct fec_enet_private {
unsigned int tx_align;
unsigned int rx_align;
 
+   unsigned int max_fl;
+
/* hw interrupt coalesce */
unsigned int rx_pkts_itr;
unsigned int rx_time_itr;
diff --git a/drivers/net/ethernet/freescale/fec_main.c 
b/drivers/net/ethernet/freescale/fec_main.c
index 38160c2bebcb..6a299a4d75ed 100644
--- a/drivers/net/ethernet/freescale/fec_main.c
+++ b/drivers/net/ethernet/freescale/fec_main.c
@@ -171,29 +171,20 @@ MODULE_PARM_DESC(macaddr, "FEC Ethernet MAC address");
 #endif
 #endif /* CONFIG_M5272 */
 
-/* The FEC stores dest/src/type/vlan, data, and checksum for receive packets.
- */
-#define PKT_MAXBUF_SIZE1522
-#define PKT_MINBUF_SIZE64
-#define PKT_MAXBLR_SIZE1536
-
 /* FEC receive acceleration */
 #define FEC_RACC_IPDIS (1 << 1)
 #define FEC_RACC_PRODIS(1 << 2)
 #define FEC_RACC_SHIFT16   BIT(7)
 #define FEC_RACC_OPTIONS   (FEC_RACC_IPDIS | FEC_RACC_PRODIS)
 
-/*
- * The 5270/5271/5280/5282/532x RX control register also contains maximum frame
- * size bits. Other FEC hardware does not, so we need to take that into
- * account when setting it.
+/* Difference between buffer size and MTU.
+ * This accounts:
+ * - possible 2 byte alignment,
+ * - standard ethernet header,
+ * - up to 8 bytes of VLAN or DSA tags,
+ * - checksum
  */
-#if defined(CONFIG_M523x) || defined(CONFIG_M527x) || defined(CONFIG_M528x) || 
\
-defined(CONFIG_M520x) || defined(CONFIG_M532x) || defined(CONFIG_ARM)
-#defineOPT_FRAME_SIZE  (PKT_MAXBUF_SIZE << 16)
-#else
-#defineOPT_FRAME_SIZE  0
-#endif
+#define FEC_BUFFER_OVERHEAD(2 + ETH_HLEN + 8 + ETH_FCS_LEN)
 
 /* FEC MII MMFR bits definition */
 #define FEC_MMFR_ST(1 << 30)
@@ -847,7 +838,7 @@ static void fec_enet_enable_ring(struct net_device *ndev)
for (i = 0; i < fep->num_rx_queues; i++) {
rxq = fep->rx_queue[i];
writel(rxq->bd.dma, fep->hwp + FEC_R_DES_START(i));
-   writel(PKT_MAXBLR_SIZE, fep->hwp + FEC_R_BUFF_SIZE(i));
+   writel(fep->max_fl, fep->hwp + FEC_R_BUFF_SIZE(i));
 
/* enable DMA1/2 */
if (i)
@@ -895,8 +886,18 @@ fec_restart(struct net_device *ndev)
struct fec_enet_private *fep = netdev_priv(ndev);
u32 val;
u32 temp_mac[2];
-   u32 rcntl = OPT_FRAME_SIZE | 0x04;
+   u32 rcntl = 0x04;
u32 ecntl = 0x2; /* ETHEREN */
+/*
+ * The 5270/5271/5280/5282/532x RX control register also contains maximum frame
+ * size bits. Other FEC hardware does not, so we need to take that into
+ * account when setting it.
+ */
+#if defined(CONFIG_M523x) || defined(CONFIG_M527x) || defined(CONFIG_M528x) || 
\
+defined(CONFIG_M520x) || defined(CONFIG_M532x) || defined(CONFIG_ARM)
+   rcntl |= (fep->max_fl << 16);
+#endif
+
 
/* Whack a reset.  We should wait for this.
 * For i.MX6SX SOC, enet use AXI bus, we use disable MAC
@@ -953,7 +954,7 @@ fec_restart(struct net_device *ndev)
else
val &= ~FEC_RACC_OPTIONS;
writel(val, fep->hwp + FEC_RACC);
-   writel(PKT_MAXBUF_SIZE, fep->hwp + FEC_FTRL);
+   writel(fep->max_fl, fep->hwp + FEC_FTRL);
}
 #endif
 
@@ -1295,7 +1296,10 @@ fec_enet_new_rxbdp(struct net_device *ndev, struct 
bufdesc *bdp, struct sk_buff
if (off)
skb_reserve(skb, fep->rx_align + 1 - off);
 
-   bdp->cbd_bufaddr = cpu_to_fec32(dma_map_single(>pdev->dev, 
skb->data, FEC_ENET_RX_FRSI

[patch net v4] net: fec: fix compile with CONFIG_M5272

2016-12-05 Thread Nikita Yushchenko
Commit 80cca775cdc4 ("net: fec: cache statistics while device is down")
introduced unconditional statistics-related actions.

However, when driver is compiled with CONFIG_M5272, staticsics-related
definitions do not exist, which results into build errors.

Fix that by adding explicit handling of !defined(CONFIG_M5272) case.

Fixes: 80cca775cdc4 ("net: fec: cache statistics while device is down")
Signed-off-by: Nikita Yushchenko <nikita.yo...@cogentembedded.com>
---
Changes from v3:
- fix reference commit id to match upstream tree

 drivers/net/ethernet/freescale/fec_main.c | 13 ++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/freescale/fec_main.c 
b/drivers/net/ethernet/freescale/fec_main.c
index 5f77caa59534..12aef1b15356 100644
--- a/drivers/net/ethernet/freescale/fec_main.c
+++ b/drivers/net/ethernet/freescale/fec_main.c
@@ -2313,6 +2313,8 @@ static const struct fec_stat {
{ "IEEE_rx_octets_ok", IEEE_R_OCTETS_OK },
 };
 
+#define FEC_STATS_SIZE (ARRAY_SIZE(fec_stats) * sizeof(u64))
+
 static void fec_enet_update_ethtool_stats(struct net_device *dev)
 {
struct fec_enet_private *fep = netdev_priv(dev);
@@ -2330,7 +2332,7 @@ static void fec_enet_get_ethtool_stats(struct net_device 
*dev,
if (netif_running(dev))
fec_enet_update_ethtool_stats(dev);
 
-   memcpy(data, fep->ethtool_stats, ARRAY_SIZE(fec_stats) * sizeof(u64));
+   memcpy(data, fep->ethtool_stats, FEC_STATS_SIZE);
 }
 
 static void fec_enet_get_strings(struct net_device *netdev,
@@ -2355,6 +2357,12 @@ static int fec_enet_get_sset_count(struct net_device 
*dev, int sset)
return -EOPNOTSUPP;
}
 }
+
+#else  /* !defined(CONFIG_M5272) */
+#define FEC_STATS_SIZE 0
+static inline void fec_enet_update_ethtool_stats(struct net_device *dev)
+{
+}
 #endif /* !defined(CONFIG_M5272) */
 
 static int fec_enet_nway_reset(struct net_device *dev)
@@ -3293,8 +3301,7 @@ fec_probe(struct platform_device *pdev)
 
/* Init network device */
ndev = alloc_etherdev_mqs(sizeof(struct fec_enet_private) +
- ARRAY_SIZE(fec_stats) * sizeof(u64),
- num_tx_qs, num_rx_qs);
+ FEC_STATS_SIZE, num_tx_qs, num_rx_qs);
if (!ndev)
return -ENOMEM;
 
-- 
2.1.4



Re: [patch net v3] net: fec: fix compile with CONFIG_M5272

2016-12-05 Thread Nikita Yushchenko
> +#define FEC_STATS_SIZE (ARRAY_SIZE(fec_stats) * sizeof(u64))
> 
> 
> Do I take it right this actually translates to (sizeof(fec_stats) /
> sizeof(u64) * sizeof(u64))?

No.

fec_stats is an array of structs, each struct has car arrsy and integer,
and size of that is definitely not bytes.


ARRAY_SIZE(fec_stats) is number of stats, i.e. it is returned from
fec_enet_get_sset_count()

Each stat in blob is u64. Thus (ARRAY_SIZE(fec_stats) * sizeof(u64)) is
size of stats blob.


[patch net v3] net: fec: fix compile with CONFIG_M5272

2016-12-05 Thread Nikita Yushchenko
Commit 4dfb80d18d05 ("net: fec: cache statistics while device is down")
introduced unconditional statistics-related actions.

However, when driver is compiled with CONFIG_M5272, staticsics-related
definitions do not exist, which results into build errors.

Fix that by adding explicit handling of !defined(CONFIG_M5272) case.

Fixes: 4dfb80d18d05 ("net: fec: cache statistics while device is down")
Signed-off-by: Nikita Yushchenko <nikita.yo...@cogentembedded.com>
---
Changes since v2:
- fix typo that caused compile problem, double-check that it compiles
  and works as expected.

 drivers/net/ethernet/freescale/fec_main.c | 13 ++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/freescale/fec_main.c 
b/drivers/net/ethernet/freescale/fec_main.c
index 5f77caa59534..12aef1b15356 100644
--- a/drivers/net/ethernet/freescale/fec_main.c
+++ b/drivers/net/ethernet/freescale/fec_main.c
@@ -2313,6 +2313,8 @@ static const struct fec_stat {
{ "IEEE_rx_octets_ok", IEEE_R_OCTETS_OK },
 };
 
+#define FEC_STATS_SIZE (ARRAY_SIZE(fec_stats) * sizeof(u64))
+
 static void fec_enet_update_ethtool_stats(struct net_device *dev)
 {
struct fec_enet_private *fep = netdev_priv(dev);
@@ -2330,7 +2332,7 @@ static void fec_enet_get_ethtool_stats(struct net_device 
*dev,
if (netif_running(dev))
fec_enet_update_ethtool_stats(dev);
 
-   memcpy(data, fep->ethtool_stats, ARRAY_SIZE(fec_stats) * sizeof(u64));
+   memcpy(data, fep->ethtool_stats, FEC_STATS_SIZE);
 }
 
 static void fec_enet_get_strings(struct net_device *netdev,
@@ -2355,6 +2357,12 @@ static int fec_enet_get_sset_count(struct net_device 
*dev, int sset)
return -EOPNOTSUPP;
}
 }
+
+#else  /* !defined(CONFIG_M5272) */
+#define FEC_STATS_SIZE 0
+static inline void fec_enet_update_ethtool_stats(struct net_device *dev)
+{
+}
 #endif /* !defined(CONFIG_M5272) */
 
 static int fec_enet_nway_reset(struct net_device *dev)
@@ -3293,8 +3301,7 @@ fec_probe(struct platform_device *pdev)
 
/* Init network device */
ndev = alloc_etherdev_mqs(sizeof(struct fec_enet_private) +
- ARRAY_SIZE(fec_stats) * sizeof(u64),
- num_tx_qs, num_rx_qs);
+ FEC_STATS_SIZE, num_tx_qs, num_rx_qs);
if (!ndev)
return -ENOMEM;
 
-- 
2.1.4



Re: [patch net v2] net: fec: fix compile with CONFIG_M5272

2016-12-05 Thread Nikita Yushchenko
> From: Nikita Yushchenko <nikita.yo...@cogentembedded.com>
> Date: Mon, 5 Dec 2016 16:55:04 +0300
> 
>> Aieee   I was typing too fast today, sorry...
>>
>> send separate "fix for the fix", or re-send patch without that silly typo?
> 
> If the patch hasn't been applied yet, you resend a fixed version of the
> patch, always.

Ok, will repost shortly.

What I don't understand is - how could test robot fetch it before it was
applied?

Nikita


Re: [patch net v2] net: fec: fix compile with CONFIG_M5272

2016-12-05 Thread Nikita Yushchenko
diff --git a/drivers/net/ethernet/freescale/fec_main.c
b/drivers/net/ethernet/freescale/fec_main.c
index 741cf4a57bfc..12aef1b15356 100644
--- a/drivers/net/ethernet/freescale/fec_main.c
+++ b/drivers/net/ethernet/freescale/fec_main.c
@@ -3301,7 +3301,7 @@ fec_probe(struct platform_device *pdev)

/* Init network device */
ndev = alloc_etherdev_mqs(sizeof(struct fec_enet_private) +
- FEC_STAT_SIZE, num_tx_qs, num_rx_qs);
+ FEC_STATS_SIZE, num_tx_qs, num_rx_qs);
if (!ndev)
return -ENOMEM;


> Aieee   I was typing too fast today, sorry...
> 
> send separate "fix for the fix", or re-send patch without that silly typo?
> 
> Nikita
> 
>> Hi Nikita,
>>
>> [auto build test ERROR on net/master]
>>
>> url:
>> https://github.com/0day-ci/linux/commits/Nikita-Yushchenko/net-fec-fix-compile-with-CONFIG_M5272/20161205-181735
>> config: arm-multi_v5_defconfig (attached as .config)
>> compiler: arm-linux-gnueabi-gcc (Debian 6.1.1-9) 6.1.1 20160705
>> reproduce:
>> wget 
>> https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
>>  -O ~/bin/make.cross
>> chmod +x ~/bin/make.cross
>> # save the attached .config to linux build tree
>> make.cross ARCH=arm 
>>
>> All errors (new ones prefixed by >>):
>>
>>drivers/net/ethernet/freescale/fec_main.c: In function 'fec_probe':
>>>> drivers/net/ethernet/freescale/fec_main.c:3304:7: error: 'FEC_STAT_SIZE' 
>>>> undeclared (first use in this function)
>>   FEC_STAT_SIZE, num_tx_qs, num_rx_qs);
>>   ^
>>drivers/net/ethernet/freescale/fec_main.c:3304:7: note: each undeclared 
>> identifier is reported only once for each function it appears in
>>
>> vim +/FEC_STAT_SIZE +3304 drivers/net/ethernet/freescale/fec_main.c
>>
>>   3298   int num_rx_qs;
>>   3299   
>>   3300   fec_enet_get_queue_num(pdev, _tx_qs, _rx_qs);
>>   3301   
>>   3302   /* Init network device */
>>   3303   ndev = alloc_etherdev_mqs(sizeof(struct 
>> fec_enet_private) +
>>> 3304  FEC_STAT_SIZE, num_tx_qs, 
>>> num_rx_qs);
>>   3305   if (!ndev)
>>   3306   return -ENOMEM;
>>   3307   
>>
>> ---
>> 0-DAY kernel test infrastructureOpen Source Technology Center
>> https://lists.01.org/pipermail/kbuild-all   Intel Corporation
>>


Re: [patch net v2] net: fec: fix compile with CONFIG_M5272

2016-12-05 Thread Nikita Yushchenko
Aieee   I was typing too fast today, sorry...

send separate "fix for the fix", or re-send patch without that silly typo?

Nikita

> Hi Nikita,
> 
> [auto build test ERROR on net/master]
> 
> url:
> https://github.com/0day-ci/linux/commits/Nikita-Yushchenko/net-fec-fix-compile-with-CONFIG_M5272/20161205-181735
> config: arm-multi_v5_defconfig (attached as .config)
> compiler: arm-linux-gnueabi-gcc (Debian 6.1.1-9) 6.1.1 20160705
> reproduce:
> wget 
> https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
>  -O ~/bin/make.cross
> chmod +x ~/bin/make.cross
> # save the attached .config to linux build tree
> make.cross ARCH=arm 
> 
> All errors (new ones prefixed by >>):
> 
>drivers/net/ethernet/freescale/fec_main.c: In function 'fec_probe':
>>> drivers/net/ethernet/freescale/fec_main.c:3304:7: error: 'FEC_STAT_SIZE' 
>>> undeclared (first use in this function)
>   FEC_STAT_SIZE, num_tx_qs, num_rx_qs);
>   ^
>drivers/net/ethernet/freescale/fec_main.c:3304:7: note: each undeclared 
> identifier is reported only once for each function it appears in
> 
> vim +/FEC_STAT_SIZE +3304 drivers/net/ethernet/freescale/fec_main.c
> 
>   3298int num_rx_qs;
>   3299
>   3300fec_enet_get_queue_num(pdev, _tx_qs, _rx_qs);
>   3301
>   3302/* Init network device */
>   3303ndev = alloc_etherdev_mqs(sizeof(struct 
> fec_enet_private) +
>> 3304   FEC_STAT_SIZE, num_tx_qs, num_rx_qs);
>   3305if (!ndev)
>   3306return -ENOMEM;
>   3307
> 
> ---
> 0-DAY kernel test infrastructureOpen Source Technology Center
> https://lists.01.org/pipermail/kbuild-all   Intel Corporation
> 


[patch net v2] net: fec: fix compile with CONFIG_M5272

2016-12-05 Thread Nikita Yushchenko
Commit 4dfb80d18d05 ("net: fec: cache statistics while device is down")
introduced unconditional statistics-related actions.

However, when driver is compiled with CONFIG_M5272, staticsics-related
definitions do not exist, which results into build errors.

Fix that by adding explicit handling of !defined(CONFIG_M5272) case.

Fixes: 4dfb80d18d05 ("net: fec: cache statistics while device is down")
Signed-off-by: Nikita Yushchenko <nikita.yo...@cogentembedded.com>
---
Changes since v1:
- instead of #ifdef'ing calls to fec_enet_update_ethtool_stats(), add
  definition of empty fec_enet_update_ethtool_stats() for CONFIG_M5272
  case,
- add FEC_STATS_SIZE macro to avoid #ifdef in the middle of
  alloc_etherdev_mqs() args.

 drivers/net/ethernet/freescale/fec_main.c | 13 ++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/freescale/fec_main.c 
b/drivers/net/ethernet/freescale/fec_main.c
index 5f77caa59534..741cf4a57bfc 100644
--- a/drivers/net/ethernet/freescale/fec_main.c
+++ b/drivers/net/ethernet/freescale/fec_main.c
@@ -2313,6 +2313,8 @@ static const struct fec_stat {
{ "IEEE_rx_octets_ok", IEEE_R_OCTETS_OK },
 };
 
+#define FEC_STATS_SIZE (ARRAY_SIZE(fec_stats) * sizeof(u64))
+
 static void fec_enet_update_ethtool_stats(struct net_device *dev)
 {
struct fec_enet_private *fep = netdev_priv(dev);
@@ -2330,7 +2332,7 @@ static void fec_enet_get_ethtool_stats(struct net_device 
*dev,
if (netif_running(dev))
fec_enet_update_ethtool_stats(dev);
 
-   memcpy(data, fep->ethtool_stats, ARRAY_SIZE(fec_stats) * sizeof(u64));
+   memcpy(data, fep->ethtool_stats, FEC_STATS_SIZE);
 }
 
 static void fec_enet_get_strings(struct net_device *netdev,
@@ -2355,6 +2357,12 @@ static int fec_enet_get_sset_count(struct net_device 
*dev, int sset)
return -EOPNOTSUPP;
}
 }
+
+#else  /* !defined(CONFIG_M5272) */
+#define FEC_STATS_SIZE 0
+static inline void fec_enet_update_ethtool_stats(struct net_device *dev)
+{
+}
 #endif /* !defined(CONFIG_M5272) */
 
 static int fec_enet_nway_reset(struct net_device *dev)
@@ -3293,8 +3301,7 @@ fec_probe(struct platform_device *pdev)
 
/* Init network device */
ndev = alloc_etherdev_mqs(sizeof(struct fec_enet_private) +
- ARRAY_SIZE(fec_stats) * sizeof(u64),
- num_tx_qs, num_rx_qs);
+ FEC_STAT_SIZE, num_tx_qs, num_rx_qs);
if (!ndev)
return -ENOMEM;
 
-- 
2.1.4



[patch net] net: fec: fix compile with CONFIG_M5272

2016-12-04 Thread Nikita Yushchenko
Commit 4dfb80d18d05 ("net: fec: cache statistics while device is down")
introduced unconditional statistics-related actions.

However, when driver is compiled with CONFIG_M5272, staticsics-related
definitions do not exist, which results into build errors.

Fix that by adding needed #if !defined(CONFIG_M5272).

Fixes: 4dfb80d18d05 ("net: fec: cache statistics while device is down")
Signed-off-by: Nikita Yushchenko <nikita.yo...@cogentembedded.com>
---
 drivers/net/ethernet/freescale/fec_main.c | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/freescale/fec_main.c 
b/drivers/net/ethernet/freescale/fec_main.c
index 6a20c24a2003..89e902767abb 100644
--- a/drivers/net/ethernet/freescale/fec_main.c
+++ b/drivers/net/ethernet/freescale/fec_main.c
@@ -2884,7 +2884,9 @@ fec_enet_close(struct net_device *ndev)
if (fep->quirks & FEC_QUIRK_ERR006687)
imx6q_cpuidle_fec_irqs_unused();
 
+#if !defined(CONFIG_M5272)
fec_enet_update_ethtool_stats(ndev);
+#endif
 
fec_enet_clk_enable(ndev, false);
pinctrl_pm_select_sleep_state(>pdev->dev);
@@ -3192,7 +3194,9 @@ static int fec_enet_init(struct net_device *ndev)
 
fec_restart(ndev);
 
+#if !defined(CONFIG_M5272)
fec_enet_update_ethtool_stats(ndev);
+#endif
 
return 0;
 }
@@ -3292,9 +3296,11 @@ fec_probe(struct platform_device *pdev)
fec_enet_get_queue_num(pdev, _tx_qs, _rx_qs);
 
/* Init network device */
-   ndev = alloc_etherdev_mqs(sizeof(struct fec_enet_private) +
- ARRAY_SIZE(fec_stats) * sizeof(u64),
- num_tx_qs, num_rx_qs);
+   ndev = alloc_etherdev_mqs(sizeof(struct fec_enet_private)
+#if !defined(CONFIG_M5272)
+ + ARRAY_SIZE(fec_stats) * sizeof(u64)
+#endif
+ , num_tx_qs, num_rx_qs);
if (!ndev)
return -ENOMEM;
 
-- 
2.1.4



DSA vs envelope frames

2016-11-30 Thread Nikita Yushchenko
>> (1) When DSA is in use, frames processed by FEC chip contain DSA tag and
>> thus can be larger than hardcoded limit of 1522. This issue is not
>> FEC-specific, any driver that hardcodes maximum frame size to 1522 (many
>> do) will have this issue if used with DSA.
> 
> BTW I'm trying to introduce envelope frames to solve this kind of problems.
> http://marc.info/?t=14749669155=1=2
> http://marc.info/?t=14749669153=1=2
> http://marc.info/?t=14749669152=1=2
> http://marc.info/?t=14749669154=1=2
> http://marc.info/?t=14749669151=1=2
> 
> It needs jumbo frame support of NICs though.

Thanks for pointing to this.

Indeed frame with DSA tag conceptually is an envelope frame.

ndev->env_hdr_len introduced by your patches, actually is explicitly
handled difference between (MTU + 18) and frame that HW should allow.
If this is known, hardware can be configured to work with DSA. At least
FEC hardware that can send and receive "slightly larger" frames after
simple register configuration.

Furthermore, since DSA configuration is known statically (it comes from
device tree), ndo_set_env_hdr_len method could be automatically called
at init, making setup working by default if driver supports that. And if
not, perhaps can automatically lower MTU.

Looks like a solution :)

What's current status of this work?

What is not really clear - what if several tagging protocols are used
together. AFAIU, things may be more complex that simple appending of
tags, e.g. EDSA tag can carry VLAN id inside.

Nikita



Re: [patch net / RFC] net: fec: increase frame size limitation to actually available buffer

2016-11-29 Thread Nikita Yushchenko
> But I think it is not necessary since the driver don't support jumbo frame.

Hardcoded 1522 raises two separate issues.

(1) When DSA is in use, frames processed by FEC chip contain DSA tag and
thus can be larger than hardcoded limit of 1522. This issue is not
FEC-specific, any driver that hardcodes maximum frame size to 1522 (many
do) will have this issue if used with DSA.

Clean solution for this must take into account that difference between
MTU and max frame size is no longer known at compile time. Actually this
is the case even without DSA, due to VLANs: max frame size is (MTU + 18)
without VLANs, but (MTU + 22) with VLANs. However currently drivers tend
to ignore this and hardcode 22.  With DSA, 22 is not enough, need to add
switch-specific tag size to that.

Not yet sure how to handle this. DSA-specific API to find out tag size
could be added, but generic solution should handle all cases of dynamic
difference between MTU and max frame size, not only DSA.


(2) There is some demand to use larger frames for optimization purposes.

FEC register fields that limit frame size are 14-bit, thus allowing
frames up to (4k-1). I'm about to prepare a larger patch:
- add ndo_change_mtu handler, allowing MTU up to (4k - overhead),
- set MAX_FL / TRUNC_FL based on configured MTU,
- if necessary, do buffer reallocation with larger buffers.

Is this suitable for upstreaming?
Is there any policy related to handling larger frames?


[patch net / RFC] net: fec: increase frame size limitation to actually available buffer

2016-11-29 Thread Nikita Yushchenko
Fec driver uses Rx buffers of 2k, but programs hardware to limit
incoming frames to 1522 bytes. This raises issues when FEC device
is used with DSA (since DSA tag can make frame larger), and also
disallows manual sending and receiving larger frames.

This patch removes the limitation, allowing Rx size up to entire buffer.
At the same time possible Tx size is increased as well, because hardware
uses the same register field to limit Rx and Tx.

Signed-off-by: Nikita Yushchenko <nikita.yo...@cogentembedded.com>
---
 drivers/net/ethernet/freescale/fec_main.c | 33 +++
 1 file changed, 12 insertions(+), 21 deletions(-)

diff --git a/drivers/net/ethernet/freescale/fec_main.c 
b/drivers/net/ethernet/freescale/fec_main.c
index 73ac35780611..c09789a71024 100644
--- a/drivers/net/ethernet/freescale/fec_main.c
+++ b/drivers/net/ethernet/freescale/fec_main.c
@@ -171,30 +171,12 @@ MODULE_PARM_DESC(macaddr, "FEC Ethernet MAC address");
 #endif
 #endif /* CONFIG_M5272 */
 
-/* The FEC stores dest/src/type/vlan, data, and checksum for receive packets.
- */
-#define PKT_MAXBUF_SIZE1522
-#define PKT_MINBUF_SIZE64
-#define PKT_MAXBLR_SIZE1536
-
 /* FEC receive acceleration */
 #define FEC_RACC_IPDIS (1 << 1)
 #define FEC_RACC_PRODIS(1 << 2)
 #define FEC_RACC_SHIFT16   BIT(7)
 #define FEC_RACC_OPTIONS   (FEC_RACC_IPDIS | FEC_RACC_PRODIS)
 
-/*
- * The 5270/5271/5280/5282/532x RX control register also contains maximum frame
- * size bits. Other FEC hardware does not, so we need to take that into
- * account when setting it.
- */
-#if defined(CONFIG_M523x) || defined(CONFIG_M527x) || defined(CONFIG_M528x) || 
\
-defined(CONFIG_M520x) || defined(CONFIG_M532x) || defined(CONFIG_ARM)
-#defineOPT_FRAME_SIZE  (PKT_MAXBUF_SIZE << 16)
-#else
-#defineOPT_FRAME_SIZE  0
-#endif
-
 /* FEC MII MMFR bits definition */
 #define FEC_MMFR_ST(1 << 30)
 #define FEC_MMFR_OP_READ   (2 << 28)
@@ -847,7 +829,8 @@ static void fec_enet_enable_ring(struct net_device *ndev)
for (i = 0; i < fep->num_rx_queues; i++) {
rxq = fep->rx_queue[i];
writel(rxq->bd.dma, fep->hwp + FEC_R_DES_START(i));
-   writel(PKT_MAXBLR_SIZE, fep->hwp + FEC_R_BUFF_SIZE(i));
+   writel(FEC_ENET_RX_FRSIZE - fep->rx_align,
+  fep->hwp + FEC_R_BUFF_SIZE(i));
 
/* enable DMA1/2 */
if (i)
@@ -895,9 +878,17 @@ fec_restart(struct net_device *ndev)
struct fec_enet_private *fep = netdev_priv(ndev);
u32 val;
u32 temp_mac[2];
-   u32 rcntl = OPT_FRAME_SIZE | 0x04;
+   u32 rcntl = 0x04;
u32 ecntl = 0x2; /* ETHEREN */
 
+   /* The 5270/5271/5280/5282/532x RX control register also contains
+* maximum frame * size bits. Other FEC hardware does not.
+*/
+#if defined(CONFIG_M523x) || defined(CONFIG_M527x) || defined(CONFIG_M528x) || 
\
+defined(CONFIG_M520x) || defined(CONFIG_M532x) || defined(CONFIG_ARM)
+   rcntl |= (FEC_ENET_RX_FRSIZE - fep->rx_align) << 16;
+#endif
+
/* Whack a reset.  We should wait for this.
 * For i.MX6SX SOC, enet use AXI bus, we use disable MAC
 * instead of reset MAC itself.
@@ -953,7 +944,7 @@ fec_restart(struct net_device *ndev)
else
val &= ~FEC_RACC_OPTIONS;
writel(val, fep->hwp + FEC_RACC);
-   writel(PKT_MAXBUF_SIZE, fep->hwp + FEC_FTRL);
+   writel(FEC_ENET_RX_FRSIZE - fep->rx_align, fep->hwp + FEC_FTRL);
}
 #endif
 
-- 
2.1.4



[patch net v2] net: fec: cache statistics while device is down

2016-11-28 Thread Nikita Yushchenko
Execution 'ethtool -S' on fec device that is down causes OOPS on Vybrid
board:

Unhandled fault: external abort on non-linefetch (0x1008) at 0xe0898200
pgd = ddecc000
[e0898200] *pgd=9e406811, *pte=400d1653, *ppte=400d1453
Internal error: : 1008 [#1] SMP ARM
...

Reason of OOPS is that fec_enet_get_ethtool_stats() accesses fec
registers while IPG clock is stopped by PM.

Fix that by caching statistics in fec_enet_private. Cache is initialized
at device probe time, and updated at statistics request time if device
is up, and also just before turning device off on down path.

Additional locking is not needed, since cached statistics is accessed
either before device is registered, or under rtnl_lock().

Signed-off-by: Nikita Yushchenko <nikita.yo...@cogentembedded.com>
---
Changes since v1:
- initialize cache at device probe time

 drivers/net/ethernet/freescale/fec.h  |  2 ++
 drivers/net/ethernet/freescale/fec_main.c | 23 +++
 2 files changed, 21 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/freescale/fec.h 
b/drivers/net/ethernet/freescale/fec.h
index c865135f3cb9..5ea740b4cf14 100644
--- a/drivers/net/ethernet/freescale/fec.h
+++ b/drivers/net/ethernet/freescale/fec.h
@@ -574,6 +574,8 @@ struct fec_enet_private {
unsigned int reload_period;
int pps_enable;
unsigned int next_counter;
+
+   u64 ethtool_stats[0];
 };
 
 void fec_ptp_init(struct platform_device *pdev);
diff --git a/drivers/net/ethernet/freescale/fec_main.c 
b/drivers/net/ethernet/freescale/fec_main.c
index 5aa9d4ded214..6a20c24a2003 100644
--- a/drivers/net/ethernet/freescale/fec_main.c
+++ b/drivers/net/ethernet/freescale/fec_main.c
@@ -2313,14 +2313,24 @@ static const struct fec_stat {
{ "IEEE_rx_octets_ok", IEEE_R_OCTETS_OK },
 };
 
-static void fec_enet_get_ethtool_stats(struct net_device *dev,
-   struct ethtool_stats *stats, u64 *data)
+static void fec_enet_update_ethtool_stats(struct net_device *dev)
 {
struct fec_enet_private *fep = netdev_priv(dev);
int i;
 
for (i = 0; i < ARRAY_SIZE(fec_stats); i++)
-   data[i] = readl(fep->hwp + fec_stats[i].offset);
+   fep->ethtool_stats[i] = readl(fep->hwp + fec_stats[i].offset);
+}
+
+static void fec_enet_get_ethtool_stats(struct net_device *dev,
+  struct ethtool_stats *stats, u64 *data)
+{
+   struct fec_enet_private *fep = netdev_priv(dev);
+
+   if (netif_running(dev))
+   fec_enet_update_ethtool_stats(dev);
+
+   memcpy(data, fep->ethtool_stats, ARRAY_SIZE(fec_stats) * sizeof(u64));
 }
 
 static void fec_enet_get_strings(struct net_device *netdev,
@@ -2874,6 +2884,8 @@ fec_enet_close(struct net_device *ndev)
if (fep->quirks & FEC_QUIRK_ERR006687)
imx6q_cpuidle_fec_irqs_unused();
 
+   fec_enet_update_ethtool_stats(ndev);
+
fec_enet_clk_enable(ndev, false);
pinctrl_pm_select_sleep_state(>pdev->dev);
pm_runtime_mark_last_busy(>pdev->dev);
@@ -3180,6 +3192,8 @@ static int fec_enet_init(struct net_device *ndev)
 
fec_restart(ndev);
 
+   fec_enet_update_ethtool_stats(ndev);
+
return 0;
 }
 
@@ -3278,7 +3292,8 @@ fec_probe(struct platform_device *pdev)
fec_enet_get_queue_num(pdev, _tx_qs, _rx_qs);
 
/* Init network device */
-   ndev = alloc_etherdev_mqs(sizeof(struct fec_enet_private),
+   ndev = alloc_etherdev_mqs(sizeof(struct fec_enet_private) +
+ ARRAY_SIZE(fec_stats) * sizeof(u64),
  num_tx_qs, num_rx_qs);
if (!ndev)
return -ENOMEM;
-- 
2.1.4



Re: [patch net] net: fec: cache statistics while device is down

2016-11-28 Thread Nikita Yushchenko
>  >
>  >+   fec_enet_update_ethtool_stats(ndev);
>  >+
> If user never open the interface, ethtool_stats[] always is 0 that are not 
> expected.
> So, it also should be called at . fec_enet_init() ?

I don't think that zero stats is wrong for never-opened interface.

However a call at init path won't hurt, so I'll add it, just to clear
the question.

Nikita


[patch net] net: fec: cache statistics while device is down

2016-11-28 Thread Nikita Yushchenko
Execution 'ethtool -S' on fec device that is down causes OOPS on Vybrid
board:

Unhandled fault: external abort on non-linefetch (0x1008) at 0xe0898200
pgd = ddecc000
[e0898200] *pgd=9e406811, *pte=400d1653, *ppte=400d1453
Internal error: : 1008 [#1] SMP ARM
...

Reason of OOPS is that fec_enet_get_ethtool_stats() accesses fec
registers while IPG clock is stopped by PM.

Fix that by caching statistics in fec_enet_private. Cache is updated
just before statistics request if device is up, and also just before
turning device off on down path.

Additional locking is not needed, since cached statistics is always
updated under rtnl_lock().

Signed-off-by: Nikita Yushchenko <nikita.yo...@cogentembedded.com>
---
 drivers/net/ethernet/freescale/fec.h  |  2 ++
 drivers/net/ethernet/freescale/fec_main.c | 21 +
 2 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/freescale/fec.h 
b/drivers/net/ethernet/freescale/fec.h
index c865135f3cb9..5ea740b4cf14 100644
--- a/drivers/net/ethernet/freescale/fec.h
+++ b/drivers/net/ethernet/freescale/fec.h
@@ -574,6 +574,8 @@ struct fec_enet_private {
unsigned int reload_period;
int pps_enable;
unsigned int next_counter;
+
+   u64 ethtool_stats[0];
 };
 
 void fec_ptp_init(struct platform_device *pdev);
diff --git a/drivers/net/ethernet/freescale/fec_main.c 
b/drivers/net/ethernet/freescale/fec_main.c
index 5aa9d4ded214..7da2d94ec8e5 100644
--- a/drivers/net/ethernet/freescale/fec_main.c
+++ b/drivers/net/ethernet/freescale/fec_main.c
@@ -2313,14 +2313,24 @@ static const struct fec_stat {
{ "IEEE_rx_octets_ok", IEEE_R_OCTETS_OK },
 };
 
-static void fec_enet_get_ethtool_stats(struct net_device *dev,
-   struct ethtool_stats *stats, u64 *data)
+static void fec_enet_update_ethtool_stats(struct net_device *dev)
 {
struct fec_enet_private *fep = netdev_priv(dev);
int i;
 
for (i = 0; i < ARRAY_SIZE(fec_stats); i++)
-   data[i] = readl(fep->hwp + fec_stats[i].offset);
+   fep->ethtool_stats[i] = readl(fep->hwp + fec_stats[i].offset);
+}
+
+static void fec_enet_get_ethtool_stats(struct net_device *dev,
+  struct ethtool_stats *stats, u64 *data)
+{
+   struct fec_enet_private *fep = netdev_priv(dev);
+
+   if (netif_running(dev))
+   fec_enet_update_ethtool_stats(dev);
+
+   memcpy(data, fep->ethtool_stats, ARRAY_SIZE(fec_stats) * sizeof(u64));
 }
 
 static void fec_enet_get_strings(struct net_device *netdev,
@@ -2874,6 +2884,8 @@ fec_enet_close(struct net_device *ndev)
if (fep->quirks & FEC_QUIRK_ERR006687)
imx6q_cpuidle_fec_irqs_unused();
 
+   fec_enet_update_ethtool_stats(ndev);
+
fec_enet_clk_enable(ndev, false);
pinctrl_pm_select_sleep_state(>pdev->dev);
pm_runtime_mark_last_busy(>pdev->dev);
@@ -3278,7 +3290,8 @@ fec_probe(struct platform_device *pdev)
fec_enet_get_queue_num(pdev, _tx_qs, _rx_qs);
 
/* Init network device */
-   ndev = alloc_etherdev_mqs(sizeof(struct fec_enet_private),
+   ndev = alloc_etherdev_mqs(sizeof(struct fec_enet_private) +
+ ARRAY_SIZE(fec_stats) * sizeof(u64),
  num_tx_qs, num_rx_qs);
if (!ndev)
return -ENOMEM;
-- 
2.1.4



Re: [PATCH] net: fec: turn on device when extracting statistics

2016-11-27 Thread Nikita Yushchenko


28.11.2016 04:29, David Miller пишет:
> From: Nikita Yushchenko <nikita.yo...@cogentembedded.com>
> Date: Fri, 25 Nov 2016 13:02:00 +0300
> 
>> +int i, ret;
>> +
>> +ret = pm_runtime_get_sync(>pdev->dev);
>> +if (IS_ERR_VALUE(ret)) {
>> +memset(data, 0, sizeof(*data) * ARRAY_SIZE(fec_stats));
>> +return;
>> +}
> 
> This really isn't the way to do this.
> 
> When the device is suspended and the clocks are going to be stopped,
> you must fetch the statistic values into a software copy and provide
> those if the device is suspended when statistics are requested.

Ok, can do that, although can't see what's wrong with waking device
here. The situation of requesting stats on down device isn't something
widely used, thus keeping handling of that as local as possible looks
better for me.


[patch net] net: dsa: fix unbalanced dsa_switch_tree reference counting

2016-11-27 Thread Nikita Yushchenko
_dsa_register_switch() gets a dsa_switch_tree object either via
dsa_get_dst() or via dsa_add_dst(). Former path does not increase kref
in returned object (resulting into caller not owning a reference),
while later path does create a new object (resulting into caller owning
a reference).

The rest of _dsa_register_switch() assumes that it owns a reference, and
calls dsa_put_dst().

This causes a memory breakage if first switch in the tree initialized
successfully, but second failed to initialize. In particular, freed
dsa_swith_tree object is left referenced by switch that was initialized,
and later access to sysfs attributes of that switch cause OOPS.

To fix, need to add kref_get() call to dsa_get_dst().

Signed-off-by: Nikita Yushchenko <nikita.yo...@cogentembedded.com>
Fixes: 83c0afaec7b7 ("net: dsa: Add new binding implementation")
Reviewed-by: Andrew Lunn <and...@lunn.ch>
---
 net/dsa/dsa2.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/dsa/dsa2.c b/net/dsa/dsa2.c
index f8a7d9aab437..5fff951a0a49 100644
--- a/net/dsa/dsa2.c
+++ b/net/dsa/dsa2.c
@@ -28,8 +28,10 @@ static struct dsa_switch_tree *dsa_get_dst(u32 tree)
struct dsa_switch_tree *dst;
 
list_for_each_entry(dst, _switch_trees, list)
-   if (dst->tree == tree)
+   if (dst->tree == tree) {
+   kref_get(>refcount);
return dst;
+   }
return NULL;
 }
 
-- 
2.1.4



[PATCH] net: dsa: fix unbalanced dsa_switch_tree reference counting

2016-11-25 Thread Nikita Yushchenko
_dsa_register_switch() gets a dsa_switch_tree object either via
dsa_get_dst() or via dsa_add_dst(). Former path does not increase kref
in returned object (resulting into caller not owning a reference),
while later path does create a new object (resulting into caller owning
a reference).

The rest of _dsa_register_switch() assumes that it owns a reference, and
calls dsa_put_dst().

This causes a memory breakage if first switch in the tree initialized
successfully, but second failed to initialize. In particular, freed
dsa_swith_tree object is left referenced by switch that was initialized,
and later access to sysfs attributes of that switch cause OOPS.

To fix, need to add kref_get() call to dsa_get_dst().

Signed-off-by: Nikita Yushchenko <nikita.yo...@cogentembedded.com>
---
 net/dsa/dsa2.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/dsa/dsa2.c b/net/dsa/dsa2.c
index f8a7d9aab437..5fff951a0a49 100644
--- a/net/dsa/dsa2.c
+++ b/net/dsa/dsa2.c
@@ -28,8 +28,10 @@ static struct dsa_switch_tree *dsa_get_dst(u32 tree)
struct dsa_switch_tree *dst;
 
list_for_each_entry(dst, _switch_trees, list)
-   if (dst->tree == tree)
+   if (dst->tree == tree) {
+   kref_get(>refcount);
return dst;
+   }
return NULL;
 }
 
-- 
2.1.4



[PATCH] net: fec: turn on device when extracting statistics

2016-11-25 Thread Nikita Yushchenko
Execution 'ethtool -S' on fec device that is down causes OOPS on Vybrid
board:

Unhandled fault: external abort on non-linefetch (0x1008) at 0xe0898200
pgd = ddecc000
[e0898200] *pgd=9e406811, *pte=400d1653, *ppte=400d1453
Internal error: : 1008 [#1] SMP ARM
...

Reason of OOPS is that fec_enet_get_ethtool_stats() accesses fec
registers while IPG clock is stopped by PM.

Fix that by wrapping statistics extraction into pm_runtime_get_sync()
... pm_runtime_put_autosuspend() braces.

Signed-off-by: Nikita Yushchenko <nikita.yo...@cogentembedded.com>
---
 drivers/net/ethernet/freescale/fec_main.c | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/freescale/fec_main.c 
b/drivers/net/ethernet/freescale/fec_main.c
index 5aa9d4ded214..9c7592b80ce8 100644
--- a/drivers/net/ethernet/freescale/fec_main.c
+++ b/drivers/net/ethernet/freescale/fec_main.c
@@ -2317,10 +2317,19 @@ static void fec_enet_get_ethtool_stats(struct 
net_device *dev,
struct ethtool_stats *stats, u64 *data)
 {
struct fec_enet_private *fep = netdev_priv(dev);
-   int i;
+   int i, ret;
+
+   ret = pm_runtime_get_sync(>pdev->dev);
+   if (IS_ERR_VALUE(ret)) {
+   memset(data, 0, sizeof(*data) * ARRAY_SIZE(fec_stats));
+   return;
+   }
 
for (i = 0; i < ARRAY_SIZE(fec_stats); i++)
data[i] = readl(fep->hwp + fec_stats[i].offset);
+
+   pm_runtime_mark_last_busy(>pdev->dev);
+   pm_runtime_put_autosuspend(>pdev->dev);
 }
 
 static void fec_enet_get_strings(struct net_device *netdev,
-- 
2.1.4



Re: igb driver can cause cache invalidation of non-owned memory?

2016-10-13 Thread Nikita Yushchenko
>>> It would make more sense to update the DMA API for
>>> __dma_page_cpu_to_dev on ARM so that you don't invalidate the cache if
>>> the direction is DMA_FROM_DEVICE.
>>
>> No, in generic case it's unsafe.
>>
>> If CPU issued a write to a location, and sometime later that location is
>> used as DMA buffer, there is danger that write is still in cache only,
>> and writeback is pending. Later this writeback can overwrite data
>> written to memory via DMA, causing corruption.
> 
> Okay so if I understand it correctly then the invalidation in
> sync_for_device is to force any writes to be flushed out, and the
> invalidation in sync_for_cpu is to flush out any speculative reads.
> So without speculative reads then the sync_for_cpu portion is not
> needed.  You might want to determine if the core you are using
> supports the speculative reads, if not you might be able to get away
> without having to do the sync_for_cpu at all.

pl310 L2 cache controller does support prefetching - and I think most
arm systems use pl310.


>>> Changing the driver code for this won't necessarily work on all
>>> architectures, and on top of it we have some changes planned which
>>> will end up making the pages writable in the future to support the
>>> ongoing XDP effort.  That is one of the reasons why I would not be
>>> okay with changing the driver to make this work.
>>
>> Well I was not really serious about removing that sync_for_device() in
>> mainline :)   Although >20% throughput win that this provides is
>> impressive...
> 
> I agree the improvement is pretty impressive.  The think is there are
> similar gains we can generate on x86 by stripping out bits and pieces
> that are needed for other architectures.  I'm just wanting to make
> certain we aren't optimizing for one architecture at the detriment of
> others.

Well in ideal world needs of other architectures should not limit x86 -
and if they do then that's a bug and should be fixed - by designing
proper set of abstractions :)

Perhaps issue is that "do whatever needed for device to perform DMA
correctly" semantics of dma_map() / sync_for_device() - and symmetric
for dma_unmap() / sync_for_cpu() - is too abstract and that hurts
performance. In particular, "setup i/o" and "sync caches" is different
activity with conflicting performance properties: for better
performance, one wants to setup i/o for larger blocks, but sync caches
for smaller blocks.  Probably separation of these activities into
different calls is the way for better performance.


>> But what about doing something safer, e.g. adding a bit of tracking and
>> only sync_for_device() what was previously sync_for_cpu()ed?  Will you
>> accept that?
> 
> The problem is that as we move things over for XDP we will be looking
> at having the CPU potentially write to any spot in the region that was
> mapped as we could append headers to the front or pad data onto the
> end of the frame.  It is probably safest for us to invalidate the
> entire region just to make sure we don't have a collision with
> something that is writing to the page.

Can't comment without knowning particular access patterns that XDP will
cause. Still rule is simple - "invalidate more" does hurt performance,
thus need to invalidate minimal required area. To avoid this
invalidation thing hurting performance on x86 that does not need
invalidation at all, good idea is to use some compile-time magic - just
to compile out unneeded things completely.

Still, XDP is future question, currently igb does not use it. Why not
improve sync_for_cpu() / sync_for_device() pairing in the current code?
I can propare such a patch. If XDP will make it irrelevant in future,
perhaps it could be just undone (and if this will cause performance
degradation, then it will be something to work on)


> So for example in the near future I am planning to expand out the
> DMA_ATTR_SKIP_CPU_SYNC DMA attribute beyond just the ARM architecture
> to see if I can expand it for use with SWIOTLB.  If we can use this on
> unmap we might be able to solve some of the existing problems that
> required us to make the page read-only since we could unmap the page
> without invalidating any existing writes on the page.

Ack. Actually it is the same decoupling between "setup i/o" and "sync
caches" I've mentioned above.

Nikita


Re: igb driver can cause cache invalidation of non-owned memory?

2016-10-12 Thread Nikita Yushchenko
> It would make more sense to update the DMA API for
> __dma_page_cpu_to_dev on ARM so that you don't invalidate the cache if
> the direction is DMA_FROM_DEVICE.

No, in generic case it's unsafe.

If CPU issued a write to a location, and sometime later that location is
used as DMA buffer, there is danger that write is still in cache only,
and writeback is pending. Later this writeback can overwrite data
written to memory via DMA, causing corruption.


> The point I was trying to make is that you are invalidating the cache
> in both the sync_for_device and sync_for_cpu.  Do you really need that
> for ARM or do you need to perform the invalidation on sync_for_device
> if that may be pushed out anyway?  If you aren't running with with
> speculative look-ups do you even need the invalidation in
> sync_for_cpu?

I'm not lowlevel arm guru and I don't know for sure. Probably another
CPU core can be accessing locations neighbor page, causing specilative
load of locations in DMA page.


> Changing the driver code for this won't necessarily work on all
> architectures, and on top of it we have some changes planned which
> will end up making the pages writable in the future to support the
> ongoing XDP effort.  That is one of the reasons why I would not be
> okay with changing the driver to make this work.

Well I was not really serious about removing that sync_for_device() in
mainline :)   Although >20% throughput win that this provides is
impressive...

But what about doing something safer, e.g. adding a bit of tracking and
only sync_for_device() what was previously sync_for_cpu()ed?  Will you
accept that?

Nikita


Re: igb driver can cause cache invalidation of non-owned memory?

2016-10-12 Thread Nikita Yushchenko
>>> To get some throughput improvement, I propose removal of that
>>> sync_for_device() before reusing buffer. Will you accept such a patch ;)
>>
>> Not one that gets rid of sync_for_device() in the driver.  From what I
>> can tell there are some DMA APIs that use that to perform the
>> invalidation on the region of memory so that it can be DMAed into.
>> Without that we run the risk of having a race between something the
>> CPU might have placed in the cache and something the device wrote into
>> memory.  The sync_for_device() call is meant to invalidate the cache
>> for the region so that when the device writes into memory there is no
>> risk of that race.
> 
> I'm not expert, but some thought...
> 
> Just remember that the cpu can do speculative memory and cache line reads.
> So you need to ensure there are no dirty cache lines when the receive
> buffer is setup and no cache lines at all at before looking at the frame.

Exactly.

And because of that, arm does cache invalidation both in sync_for_cpu()
and sync_for_device().

> If you can 100% guarantee the cpu hasn't dirtied the cache then I
> think the invalidate prior to reusing the buffer can be skipped.
> But I wouldn't want to debug that going wrong.

I've written earlier in this thread why it is the case for igb - as long
as Alexander's statement that igb's buffers are readonly for the stack
is true. If Alexander's statement is wrong, then igb is vulnerable to
issue I've described in the first message of this thread.

Btw, we are unable get any breakage with that sync_to_device() removed
already for a day. And - our tests run with software checksumming,
because on our hardware i210 is wired connected to DSA switch and in
this config, no i210 offloading works (i210 hardware gets frames with
eDSA headers that it can't parse). Thus any breakage should be
immediately discovered.

And throughput measured by iperf raises from 650 to 850 Mbps.

Nikita


Re: igb driver can cause cache invalidation of non-owned memory?

2016-10-12 Thread Nikita Yushchenko
>>> The main reason why this isn't a concern for the igb driver is because
>>> we currently pass the page up as read-only.  We don't allow the stack
>>> to write into the page by keeping the page count greater than 1 which
>>> means that the page is shared.  It isn't until we unmap the page that
>>> the page count is allowed to drop to 1 indicating that it is writable.
>>
>> Doesn't that mean that sync_to_device() in igb_reuse_rx_page() can be
>> avoided? If page is read only for entire world, then it can't be dirty
>> in cache and thus device can safely write to it without preparation step.
> 
> For the sake of correctness we were adding the
> dma_sync_single_range_for_device.

Could you please elaborate this "for sake of correctness"?

If by "correctness" you mean ensuring that buffer gets frame DMAed by
device and that's not broken by cache activity, then:
- on first use of this buffer after page allocation, sync_for_device()
is not needed due to previous dma_page_map() call,
- on later uses of the same buffer, sync_for_device() is not needed due
to buffer being read-only since dma_page_map() call, thus it can't be
dirty in cache and thus no writebacks of this area can be possible.

If by "correctness" you mean strict following "ownership" concept - i.e.
memory area is "owned" either by cpu or by device, and "ownersip" must
be passed to device before DMA and back to cpu after DMA - then, igb
driver already breaks these rules anyway:
- igb calls dma_map_page() at page allocation time, thus entire page
becomes "owned" by device,
- and then, on first use of second buffer inside the page, igb calls
sync_for_device() for buffer area, despite of that area is already
"owned" by device,
- and later, if a buffer within page gets reused, igb calls
sync_for_device() for entire buffer, despite of only part of buffer was
sync_for_cpu()'ed at time of completing receive of previous frame into
this buffer,
- and later, igb calls dma_unmap_page(), despite of that part of page
was sync_for_cpu()'ed and thus is "owned" by CPU.

Given all that, not calling sync_for_device() before reusing buffer
won't make picture much worse :)

> Since it is an DMA_FROM_DEVICE
> mapping calling it should really have no effect for most DMA mapping
> interfaces.

Unfortunately dma_sync_single_range_for_device() *is* slow on imx6q - it
does cache invalidation.  I don't really understand why invalidating
cache can be slow - it only removes data from cache, it should not
access slow outer memory - but cache invalidation *is* in top of perf
profiles.

To get some throughput improvement, I propose removal of that
sync_for_device() before reusing buffer. Will you accept such a patch ;)


> Also you may want to try updating to the 4.8 version of the driver.
> It reduces the size of the dma_sync_single_range_for_cpu loops by
> reducing the sync size down to the size that was DMAed into the
> buffer.

Actually that patch came out of the work I'm currently participating in
;).  Sure I have it.

> Specifically I believe
> the 0->100% accounting problem is due to the way this is all tracked.

Thanks for this hint - shame on me not realizing this earlier...

> You may want to try pulling the most recent net-next kernel and
> testing that to see if you still see the same behavior as Eric has
> recently added a fix that is meant to allow for better sharing between
> softirq polling and applications when dealing with stuff like UDP
> traffic.
> 
> As far as identifying the problem areas your best bet would be to push
> the CPU to 100% and then identify the hot spots.

Thanks for hints


Nikita


Re: igb driver can cause cache invalidation of non-owned memory?

2016-10-10 Thread Nikita Yushchenko
Hi Alexander

Thanks for your explanation.

> The main reason why this isn't a concern for the igb driver is because
> we currently pass the page up as read-only.  We don't allow the stack
> to write into the page by keeping the page count greater than 1 which
> means that the page is shared.  It isn't until we unmap the page that
> the page count is allowed to drop to 1 indicating that it is writable.

Doesn't that mean that sync_to_device() in igb_reuse_rx_page() can be
avoided? If page is read only for entire world, then it can't be dirty
in cache and thus device can safely write to it without preparation step.

Nikita


P.S.

We are observing strange performance anomaly with igb on imx6q board.

Test is - simple iperf UDP receive. Other host runs
  iperf -c X.X.X.X -u -b xxxM -t 300 -i 3

Imx6q board can run iperf -s -u, or it can run nothing - result is the same.

While generated traffic (controlled via xxx) is slow, softirq thread on
imx6 board takes near-zero cpu time.  With increasing xxx, it still is
near zero - up to some moment about 700 Mbps. At this moment softirqd
cpu usage suddenly jumps to almost 100%.  Without anything in between:
it is near-zero with slightly smaller traffic, and it is immediately
>99% with slightly larger traffic.

Profiling this situation (>99% in softirqd) with perf gives up to 50%
hits inside cache invalidation loops. That's why originally we thought
cache invalidation is slow. But having the above dependency between
traffic and softirq cpu usage (where napi code runs) can't be explained
with slow cache invalidation.

Also there are additional factors:
- if UDP traffic is dropped - via iptables, or via forcing error paths
at different points of network stack - softirqd cpu usage drops back to
near-zero - although it still does all the same cache invalidations,
- I tried to modify igb driver to disallow page reuse (made
igb_add_rx_frag() always returning false). Result was - "border traffic"
where softirq cpu usage goes from zero to 100% changed from ~700 Mbps to
~400 Mbps.

Any ideas what can happen there, and how to debug it?

TIA


Re: igb driver can cause cache invalidation of non-owned memory?

2016-10-10 Thread Nikita Yushchenko
>> All the data has been synced
> 
> Non-synced data is write done by CPU executing upper layers of network stack,

Upper layers shall never get area that is still dma_map()ed and will be
dma_unmap()ed in future.  But with igb, this is exactly what happens.


Re: igb driver can cause cache invalidation of non-owned memory?

2016-10-10 Thread Nikita Yushchenko
>> Hmm...  I'm not about device writing to memory.
> 
> This absolutely is about whether the device wrote into the
> area or not.

Not only.
 
>> Sequence in igb driver is:
>>
>> dma_map(full_page)
>>   
>> sync_to_cpu(half_page);
>> skb_add_rx_frag(skb, half_page);
>> napi_gro_receive(skb);
>>   ...
>> dma_unmap(full_page)
>>
>> What I'm concerned about is - same area is first passed up to network
>> stack, and _later_ dma_unmap()ed.  Is this indeed safe?
> 
> dma_unmap() should never write anything unless the device has
> meanwhile written to that chunk of memory.

dma_unmap() for DMA_FROM_DEVICE never writes whatever to memory,
regardless of what device did.

dma_unmap() for DMA_FROM_DEVICE ensures that data written to memory
by device (if any) is visible to CPU. Cache may contain stale data
for that memory region. To drop that from cache, dma_unmap() for
DMA_FROM_DEVICE does cache invalidation.

static void arm_dma_unmap_page(struct device *dev, dma_addr_t handle,
size_t size, enum dma_data_direction dir, unsigned long attrs)
{
if ((attrs & DMA_ATTR_SKIP_CPU_SYNC) == 0)
__dma_page_dev_to_cpu(pfn_to_page(dma_to_pfn(dev, handle)),
  handle & ~PAGE_MASK, size, dir);
}

static void __dma_page_dev_to_cpu(struct page *page, unsigned long off,
size_t size, enum dma_data_direction dir)
{
...
if (dir != DMA_TO_DEVICE) {
outer_inv_range(paddr, paddr + size);

dma_cache_maint_page(page, off, size, dir, dmac_unmap_area);
}
...
}


> If the device made no intervening writes into the area, dma_unmap()
> should not cause any data to be written to that area, period.

I'm not about writing.

I'm about just the opposite - dropping not-written data from cache.

- napi_gro_receive(skb) passes area to upper layers of the network stack,
- something in those layers - perhaps packet mangling or such - writes
  to the area,
- this write enters cache but does not end into memory immediately,
- at this moment, igb does dma_unmap(),
- write that was in cache but not yet in memory gets lost.


> In your example above, consider the case where the device never
> writes into the memory area after sync_to_cpu().  In that case
> there is nothing that dma_unmap() can possibly write.
> All the data has been synced

Non-synced data is write done by CPU executing upper layers of network stack,


Nikita


Re: igb driver can cause cache invalidation of non-owned memory?

2016-10-10 Thread Nikita Yushchenko
>> With this scheme, page used for Rx is completely dma_map()ed at
>> allocation time, split into two buffers, and individual buffer is
>> sync_to_cpu()ed AND PASSED TO NETWORK STACK via skb_add_rx_frag() -
>> while driver driver still uses other buffer. Later, when driver decides
>> to no longer use this page, it will dma_unmap() it completely - which on
>> archs with non-coherent caches means cache invalidation. This cache
>> invalidation will include area that is already passed elsewhere.
> 
> This should happen only if the device wrote into that piece of the
> memory which it absolutely should not.

Hmm...  I'm not about device writing to memory.


Sequence in igb driver is:

dma_map(full_page)
  
sync_to_cpu(half_page);
skb_add_rx_frag(skb, half_page);
napi_gro_receive(skb);
  ...
dma_unmap(full_page)

What I'm concerned about is - same area is first passed up to network
stack, and _later_ dma_unmap()ed.  Is this indeed safe?


Nikita


igb driver can cause cache invalidation of non-owned memory?

2016-10-10 Thread Nikita Yushchenko
Hi


DMA mapping scheme introduced in commit cbc8e55f6fda ('igb: Map entire
page and sync half instead of mapping and unmapping half pages') back in
2012, and used up to now, can probably cause breakage of unrelated code
on archs with non-coherent caches.

With this scheme, page used for Rx is completely dma_map()ed at
allocation time, split into two buffers, and individual buffer is
sync_to_cpu()ed AND PASSED TO NETWORK STACK via skb_add_rx_frag() -
while driver driver still uses other buffer. Later, when driver decides
to no longer use this page, it will dma_unmap() it completely - which on
archs with non-coherent caches means cache invalidation. This cache
invalidation will include area that is already passed elsewhere. If
external code has performed any writes to that area and writes still are
in cache only, cache invalidation will cause writes to be lost.


I'm not sure if this breakage is indeed possible. I did not face it,
just found while checking how things work.

Code in question is in kernel already for 4 years. However, since (1)
igb is mostly used on x86 where caches are coherent, and (2) Rx buffers
are normally not written to, it could stay unnoticed all that time.

Could somebody please comment on this?


Nikita Yushchenko