Re: canfdtest on flexcan loopback

2020-09-18 Thread Marc Kleine-Budde
On 9/18/20 2:27 PM, Vladimir Oltean wrote:
>> Is there a bug in v5.4.3? v5.4.66 is current latest v5.4.x

> commit 63d5320a0c9b9867628a3a5a12e7f11d4cc109c2
> Author: Paolo Abeni 
> Date:   Tue Feb 18 18:15:44 2020 +0100
> 
> Revert "net: dev: introduce support for sch BYPASS for lockless qdisc"
> 
> [ Upstream commit 379349e9bc3b42b8b2f8f7a03f64a97623fff323 ]
> 
> This reverts commit ba27b4cdaaa66561aaedb2101876e563738d36fe
> 
> Ahmed reported ouf-of-order issues bisected to commit ba27b4cdaaa6

yes, that's probably the the issue.

This concluded into using an old LTS kernel, is worse than using the latest
kernel :)

> ("net: dev: introduce support for sch BYPASS for lockless qdisc").
> I can't find any working solution other than a plain revert.
> 
> This will introduce some minor performance regressions for
> pfifo_fast qdisc. I plan to address them in net-next with more
> indirect call wrapper boilerplate for qdiscs.
> 
> Reported-by: Ahmad Fatoum 
> Fixes: ba27b4cdaaa6 ("net: dev: introduce support for sch BYPASS for 
> lockless qdisc")
> Signed-off-by: Paolo Abeni 
> Signed-off-by: David S. Miller 
> Signed-off-by: Greg Kroah-Hartman 
> 
>  net/core/dev.c | 22 ++
>  1 file changed, 2 insertions(+), 20 deletions(-)

Marc

-- 
Pengutronix e.K. | Marc Kleine-Budde   |
Embedded Linux   | https://www.pengutronix.de  |
Vertretung West/Dortmund | Phone: +49-231-2826-924 |
Amtsgericht Hildesheim, HRA 2686 | Fax:   +49-5121-206917- |



signature.asc
Description: OpenPGP digital signature


Re: canfdtest on flexcan loopback

2020-09-18 Thread Vladimir Oltean
On Thu, Sep 17, 2020 at 03:10:28PM +0200, Marc Kleine-Budde wrote:
> On 9/17/20 2:59 PM, Vladimir Oltean wrote:
> > On Wed, Sep 16, 2020 at 03:14:37PM +0300, Vladimir Oltean wrote:
> >> Nonetheless, you bring up a good point. I'll try to bring into net-next
> >> the minimum amount of required delta (which seems to be the
> >> fsl_lx2160a_r1_devtype_data structure only), and I'll re-test.
> >
> > So I'm back with some interesting results.
> >
> > Test 1:
> > NXP LSDK 20.04 based on v5.4.3 - reordering reproduces instantly
> >
> > Test 2:
> > net-next 5.9-rc3 with upstream defconfig and all downstream patches
> > imported from LSDK - hasn't reproduced in 20 hours of testing
> >
> > Test 3:
> > net-next 5.9-rc3 with .config imported from LSDK - hasn't reproduced in
> > 335 iterations
> >
> > Test 4:
> > linux stable linux-5.4.y - hasn't reproduced in 137 iterations
> >
> > Test 5:
> > torvalds v5.4.3 - reproduces instantly
>
> Is there a bug in v5.4.3? v5.4.66 is current latest v5.4.x
>
> Marc
>
> --
> Pengutronix e.K. | Marc Kleine-Budde   |
> Embedded Linux   | https://www.pengutronix.de  |
> Vertretung West/Dortmund | Phone: +49-231-2826-924 |
> Amtsgericht Hildesheim, HRA 2686 | Fax:   +49-5121-206917- |
>

63d5320a0c9b9867628a3a5a12e7f11d4cc109c2 is the first fixed commit

commit 63d5320a0c9b9867628a3a5a12e7f11d4cc109c2
Author: Paolo Abeni 
Date:   Tue Feb 18 18:15:44 2020 +0100

Revert "net: dev: introduce support for sch BYPASS for lockless qdisc"

[ Upstream commit 379349e9bc3b42b8b2f8f7a03f64a97623fff323 ]

This reverts commit ba27b4cdaaa66561aaedb2101876e563738d36fe

Ahmed reported ouf-of-order issues bisected to commit ba27b4cdaaa6
("net: dev: introduce support for sch BYPASS for lockless qdisc").
I can't find any working solution other than a plain revert.

This will introduce some minor performance regressions for
pfifo_fast qdisc. I plan to address them in net-next with more
indirect call wrapper boilerplate for qdiscs.

Reported-by: Ahmad Fatoum 
Fixes: ba27b4cdaaa6 ("net: dev: introduce support for sch BYPASS for 
lockless qdisc")
Signed-off-by: Paolo Abeni 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman 

 net/core/dev.c | 22 ++
 1 file changed, 2 insertions(+), 20 deletions(-)

Re: canfdtest on flexcan loopback

2020-09-17 Thread Marc Kleine-Budde
On 9/17/20 2:59 PM, Vladimir Oltean wrote:
> On Wed, Sep 16, 2020 at 03:14:37PM +0300, Vladimir Oltean wrote:
>> Nonetheless, you bring up a good point. I'll try to bring into net-next
>> the minimum amount of required delta (which seems to be the
>> fsl_lx2160a_r1_devtype_data structure only), and I'll re-test.
> 
> So I'm back with some interesting results.
> 
> Test 1:
> NXP LSDK 20.04 based on v5.4.3 - reordering reproduces instantly
> 
> Test 2:
> net-next 5.9-rc3 with upstream defconfig and all downstream patches
> imported from LSDK - hasn't reproduced in 20 hours of testing
> 
> Test 3:
> net-next 5.9-rc3 with .config imported from LSDK - hasn't reproduced in
> 335 iterations
> 
> Test 4:
> linux stable linux-5.4.y - hasn't reproduced in 137 iterations
> 
> Test 5:
> torvalds v5.4.3 - reproduces instantly

Is there a bug in v5.4.3? v5.4.66 is current latest v5.4.x

Marc

-- 
Pengutronix e.K. | Marc Kleine-Budde   |
Embedded Linux   | https://www.pengutronix.de  |
Vertretung West/Dortmund | Phone: +49-231-2826-924 |
Amtsgericht Hildesheim, HRA 2686 | Fax:   +49-5121-206917- |



signature.asc
Description: OpenPGP digital signature


Re: canfdtest on flexcan loopback

2020-09-17 Thread Vladimir Oltean
On Wed, Sep 16, 2020 at 03:14:37PM +0300, Vladimir Oltean wrote:
> Nonetheless, you bring up a good point. I'll try to bring into net-next
> the minimum amount of required delta (which seems to be the
> fsl_lx2160a_r1_devtype_data structure only), and I'll re-test.

So I'm back with some interesting results.

Test 1:
NXP LSDK 20.04 based on v5.4.3 - reordering reproduces instantly

Test 2:
net-next 5.9-rc3 with upstream defconfig and all downstream patches
imported from LSDK - hasn't reproduced in 20 hours of testing

Test 3:
net-next 5.9-rc3 with .config imported from LSDK - hasn't reproduced in
335 iterations

Test 4:
linux stable linux-5.4.y - hasn't reproduced in 137 iterations

Test 5:
torvalds v5.4.3 - reproduces instantly

Thanks,
-Vladimir

RE: canfdtest on flexcan loopback

2020-09-16 Thread Joakim Zhang

> -Original Message-
> From: Marc Kleine-Budde 
> Sent: 2020年9月16日 20:01
> To: Vladimir Oltean 
> Cc: w...@grandegger.com; Pankaj Bansal ; Pankaj
> Bansal (OSS) ; linux-...@vger.kernel.org;
> Joakim Zhang ; linux-kernel@vger.kernel.org;
> Vladimir Oltean 
> Subject: Re: canfdtest on flexcan loopback
> 
> On 9/16/20 1:45 PM, Vladimir Oltean wrote:
> > On Wed, Sep 16, 2020 at 01:32:49PM +0200, Marc Kleine-Budde wrote:
> >> Which driver are you using? The mainline driver only uses one TX buffer.
> >
> > Are there multiple flexcan drivers in circulation? Yes, the mainline
> > driver with a single priv->tx_mb.
> 
> I assume nxp has several patches on their kernels. Are you using the mainline
> kernel or the one that's provided by nxp?

Hi Marc, Vladimir,

Yes, Vladimir uses kernel provided by NXP, I also help try to look into this 
issue, but it can't be reproduced on i.MX platforms.

Our local flexcan driver is almost cherry-picked from linux-can-next/flexcan 
branch to implement CAN FD mode, which is a version that cleaned up by you 
before.
I confirm that we still use single TX mailbox to send frames, per my 
understanding, reorder should not occur here.

According to Vladimir's description, exactly it happens:
"I have added trace points to the end of the flexcan_start_xmit function, which 
print the entire skb, and the frames appear to be written to the TX message 
buffer in the correct order. They are seen, however, in the incorrect order on 
the wire."

Since Vladimir only test Classic mode, he can turn to 5.4 upstream kernel to 
see if this reorder issue also can be reproduced. @Vladimir Oltean, could you 
please have a try?
To easy the test, I think you only need replace below several files at local 
side, then use "fsl,ls1021ar2-flexcan" compatible string in dts.
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/drivers/net/can/flexcan.c?h=v5.4.65
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/drivers/net/can/rx-offload.c?h=v5.4.65
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/include/linux/can/rx-offload.h?h=v5.4.65

If it can't work, suggest to use 5.4 upstream kernel to reproduce this issue.

Best Regards,
Joakim Zhang
> Marc
> 
> --
> Pengutronix e.K. | Marc Kleine-Budde   |
> Embedded Linux   | https://www.pengutronix.de  |
> Vertretung West/Dortmund | Phone: +49-231-2826-924 |
> Amtsgericht Hildesheim, HRA 2686 | Fax:   +49-5121-206917- |



Re: canfdtest on flexcan loopback

2020-09-16 Thread Vladimir Oltean
On Wed 9/2/2020 10:09 AM, Wolfgang Grandegger wrote:
> canfdtest normally runs on the DUT *and* a the host. The DUT receives
> the messages from the host, increments the frame data bytes and then
> sends them back to the host. With "loopback" mode, the data bytes are
> not incremented and that's what you see above.
>
> Wolfgang

Wolfgang is of course right, but we're nonetheless investigating what
seems to be a real problem, and what Pankaj had seen was a red herring.

So currently what I suspect is going on, when I am running canfdtest
between 2 LS1028A-RDB boards, is that the DUT is reordering frames on
TX.

See, for example, the screenshot below:
https://drive.google.com/file/d/1rOeW3aXh3kPh1CJ39lCccRfjFz5JN5I6/view?usp=sharing

I have added trace points to the end of the flexcan_start_xmit function,
which print the entire skb, and the frames appear to be written to the
TX message buffer in the correct order. They are seen, however, in the
incorrect order on the wire.

Thanks,
-Vladimir


Re: canfdtest on flexcan loopback

2020-09-16 Thread Marc Kleine-Budde
On 9/16/20 1:04 PM, Vladimir Oltean wrote:
> [ resending, forgot to copy Wolfgang ]
> 
> On Wed 9/2/2020 10:09 AM, Wolfgang Grandegger wrote:
>> canfdtest normally runs on the DUT *and* a the host. The DUT receives
>> the messages from the host, increments the frame data bytes and then
>> sends them back to the host. With "loopback" mode, the data bytes are
>> not incremented and that's what you see above.
>>
>> Wolfgang
> 
> Wolfgang is of course right, but we're nonetheless investigating what
> seems to be a real problem, and what Pankaj had seen was a red herring.
> 
> So currently what I suspect is going on, when I am running canfdtest
> between 2 LS1028A-RDB boards, is that the DUT is reordering frames on
> TX.
> 
> See, for example, the screenshot below:
> https://drive.google.com/file/d/1rOeW3aXh3kPh1CJ39lCccRfjFz5JN5I6/view?usp=sharing
> 
> I have added trace points to the end of the flexcan_start_xmit function,
> which print the entire skb, and the frames appear to be written to the
> TX message buffer in the correct order. They are seen, however, in the
> incorrect order on the wire.

Which driver are you using? The mainline driver only uses one TX buffer.

Marc

-- 
Pengutronix e.K. | Marc Kleine-Budde   |
Embedded Linux   | https://www.pengutronix.de  |
Vertretung West/Dortmund | Phone: +49-231-2826-924 |
Amtsgericht Hildesheim, HRA 2686 | Fax:   +49-5121-206917- |



signature.asc
Description: OpenPGP digital signature


Re: canfdtest on flexcan loopback

2020-09-16 Thread Vladimir Oltean
On Wed, Sep 16, 2020 at 01:32:49PM +0200, Marc Kleine-Budde wrote:
> Which driver are you using? The mainline driver only uses one TX buffer.

Are there multiple flexcan drivers in circulation? Yes, the mainline
driver with a single priv->tx_mb.

Thanks,
-Vladimir


Re: canfdtest on flexcan loopback

2020-09-16 Thread Vladimir Oltean
[ resending, forgot to copy Wolfgang ]

On Wed 9/2/2020 10:09 AM, Wolfgang Grandegger wrote:
> canfdtest normally runs on the DUT *and* a the host. The DUT receives
> the messages from the host, increments the frame data bytes and then
> sends them back to the host. With "loopback" mode, the data bytes are
> not incremented and that's what you see above.
>
> Wolfgang

Wolfgang is of course right, but we're nonetheless investigating what
seems to be a real problem, and what Pankaj had seen was a red herring.

So currently what I suspect is going on, when I am running canfdtest
between 2 LS1028A-RDB boards, is that the DUT is reordering frames on
TX.

See, for example, the screenshot below:
https://drive.google.com/file/d/1rOeW3aXh3kPh1CJ39lCccRfjFz5JN5I6/view?usp=sharing

I have added trace points to the end of the flexcan_start_xmit function,
which print the entire skb, and the frames appear to be written to the
TX message buffer in the correct order. They are seen, however, in the
incorrect order on the wire.

Thanks,
-Vladimir


Re: canfdtest on flexcan loopback

2020-09-16 Thread Vladimir Oltean
On Wed, Sep 16, 2020 at 02:01:11PM +0200, Marc Kleine-Budde wrote:
> On 9/16/20 1:45 PM, Vladimir Oltean wrote:
> > On Wed, Sep 16, 2020 at 01:32:49PM +0200, Marc Kleine-Budde wrote:
> >> Which driver are you using? The mainline driver only uses one TX buffer.
> >
> > Are there multiple flexcan drivers in circulation? Yes, the mainline
> > driver with a single priv->tx_mb.
>
> I assume nxp has several patches on their kernels. Are you using the mainline
> kernel or the one that's provided by nxp?

Ah, ok, that's what you mean.

So, yes, I'm using an NXP-provided kernel, just because the NXP
maintainers didn't bother to submit the SoC support upstream,
apparently.

Here's the diff to net-next, some things were added by me for debugging:
flexcan_dump_regs() and trace_canfd_frame().

I am using the fsl_lx2160a_r1_devtype_data structure, looking at the
delta it seems to me like most of the additions (CAN FD) should be
bypassed because I have commented out FLEXCAN_QUIRK_USE_OFF_TIMESTAMP |
FLEXCAN_QUIRK_TIMESTAMP_SUPPORT_FD.

Nonetheless, you bring up a good point. I'll try to bring into net-next
the minimum amount of required delta (which seems to be the
fsl_lx2160a_r1_devtype_data structure only), and I'll re-test.

--- net-next/drivers/net/can/flexcan.c  2020-05-20 14:15:30.365471681 +0300
+++ qoriq-linux/drivers/net/can/flexcan.c   2020-09-16 12:16:49.362992279 
+0300
@@ -6,6 +6,7 @@
 // Copyright (c) 2009 Sascha Hauer, Pengutronix
 // Copyright (c) 2010-2017 Pengutronix, Marc Kleine-Budde 

 // Copyright (c) 2014 David Jander, Protonic Holland
+// Copyright 2015, 2018 NXP
 //
 // Based on code originally by Andrey Volkov 
 
@@ -26,7 +27,14 @@
 #include 
 #include 
 #include 
+#include 
 #include 
+#include 
+
+#ifdef CONFIG_IMX_SCU_SOC
+#include 
+#include 
+#endif
 
 #define DRV_NAME   "flexcan"
 
@@ -52,6 +60,7 @@
 #define FLEXCAN_MCR_IRMQ   BIT(16)
 #define FLEXCAN_MCR_LPRIO_EN   BIT(13)
 #define FLEXCAN_MCR_AENBIT(12)
+#define FLEXCAN_MCR_FDEN   BIT(11)
 /* MCR_MAXMB: maximum used MBs is MAXMB + 1 */
 #define FLEXCAN_MCR_MAXMB(x)   ((x) & 0x7f)
 #define FLEXCAN_MCR_IDAM_A (0x0 << 8)
@@ -91,6 +100,7 @@
 #define FLEXCAN_CTRL2_MRP  BIT(18)
 #define FLEXCAN_CTRL2_RRS  BIT(17)
 #define FLEXCAN_CTRL2_EACENBIT(16)
+#define FLEXCAN_CTRL2_ISOCANFDEN   BIT(12)
 
 /* FLEXCAN memory error control register (MECR) bits */
 #define FLEXCAN_MECR_ECRWRDIS  BIT(31)
@@ -134,8 +144,30 @@
(FLEXCAN_ESR_ERR_BUS | FLEXCAN_ESR_ERR_STATE)
 #define FLEXCAN_ESR_ALL_INT \
(FLEXCAN_ESR_TWRN_INT | FLEXCAN_ESR_RWRN_INT | \
-FLEXCAN_ESR_BOFF_INT | FLEXCAN_ESR_ERR_INT | \
-FLEXCAN_ESR_WAK_INT)
+FLEXCAN_ESR_BOFF_INT | FLEXCAN_ESR_ERR_INT)
+
+/* FLEXCAN Bit Timing register (CBT) bits */
+#define FLEXCAN_CBT_BTFBIT(31)
+#define FLEXCAN_CBT_EPRESDIV(x)(((x) & 0x3ff) << 21)
+#define FLEXCAN_CBT_ERJW(x)(((x) & 0x0f) << 16)
+#define FLEXCAN_CBT_EPROPSEG(x)(((x) & 0x3f) << 10)
+#define FLEXCAN_CBT_EPSEG1(x)  (((x) & 0x1f) << 5)
+#define FLEXCAN_CBT_EPSEG2(x)  ((x) & 0x1f)
+
+/* FLEXCAN FD control register (FDCTRL) bits */
+#define FLEXCAN_FDCTRL_FDRATE  BIT(31)
+#define FLEXCAN_FDCTRL_TDCEN   BIT(15)
+#define FLEXCAN_FDCTRL_TDCFAIL BIT(14)
+#define FLEXCAN_FDCTRL_MBDSR1(x)   (((x) & 0x3) << 19)
+#define FLEXCAN_FDCTRL_MBDSR0(x)   (((x) & 0x3) << 16)
+#define FLEXCAN_FDCTRL_TDCOFF(x)   (((x) & 0x1f) << 8)
+
+/* FLEXCAN FD Bit Timing register (FDCBT) bits */
+#define FLEXCAN_FDCBT_FPRESDIV(x)  (((x) & 0x3ff) << 20)
+#define FLEXCAN_FDCBT_FRJW(x)  (((x) & 0x07) << 16)
+#define FLEXCAN_FDCBT_FPROPSEG(x)  (((x) & 0x1f) << 10)
+#define FLEXCAN_FDCBT_FPSEG1(x)(((x) & 0x07) << 5)
+#define FLEXCAN_FDCBT_FPSEG2(x)((x) & 0x07)
 
 /* FLEXCAN interrupt flag register (IFLAG) bits */
 /* Errata ERR005829 step7: Reserve first valid MB */
@@ -161,6 +193,9 @@
 #define FLEXCAN_MB_CODE_TX_DATA(0xc << 24)
 #define FLEXCAN_MB_CODE_TX_TANSWER (0xe << 24)
 
+#define FLEXCAN_MB_CNT_EDL BIT(31)
+#define FLEXCAN_MB_CNT_BRS BIT(30)
+#define FLEXCAN_MB_CNT_ESI BIT(29)
 #define FLEXCAN_MB_CNT_SRR BIT(22)
 #define FLEXCAN_MB_CNT_IDE BIT(21)
 #define FLEXCAN_MB_CNT_RTR BIT(20)
@@ -172,15 +207,17 @@
 /* FLEXCAN hardware feature flags
  *
  * Below is some version info we got:
- *SOC   Version   IP-Version  Glitch- [TR]WRN_INT IRQ Err Memory err RTR 
re-
- *Filter? connected?  Passive detection  
ception in MB
- *   MX25  FlexCAN2  03.00.00.00 nonono   nono
- *   MX28  FlexCAN2  03.00.04.00yes   yesno   nono
- *   MX35  FlexCAN2  03.00.00.00  

Re: canfdtest on flexcan loopback

2020-09-16 Thread Marc Kleine-Budde
On 9/16/20 1:45 PM, Vladimir Oltean wrote:
> On Wed, Sep 16, 2020 at 01:32:49PM +0200, Marc Kleine-Budde wrote:
>> Which driver are you using? The mainline driver only uses one TX buffer.
> 
> Are there multiple flexcan drivers in circulation? Yes, the mainline
> driver with a single priv->tx_mb.

I assume nxp has several patches on their kernels. Are you using the mainline
kernel or the one that's provided by nxp?

Marc

-- 
Pengutronix e.K. | Marc Kleine-Budde   |
Embedded Linux   | https://www.pengutronix.de  |
Vertretung West/Dortmund | Phone: +49-231-2826-924 |
Amtsgericht Hildesheim, HRA 2686 | Fax:   +49-5121-206917- |



signature.asc
Description: OpenPGP digital signature