Re: canfdtest on flexcan loopback
On 9/18/20 2:27 PM, Vladimir Oltean wrote: >> Is there a bug in v5.4.3? v5.4.66 is current latest v5.4.x > commit 63d5320a0c9b9867628a3a5a12e7f11d4cc109c2 > Author: Paolo Abeni > Date: Tue Feb 18 18:15:44 2020 +0100 > > Revert "net: dev: introduce support for sch BYPASS for lockless qdisc" > > [ Upstream commit 379349e9bc3b42b8b2f8f7a03f64a97623fff323 ] > > This reverts commit ba27b4cdaaa66561aaedb2101876e563738d36fe > > Ahmed reported ouf-of-order issues bisected to commit ba27b4cdaaa6 yes, that's probably the the issue. This concluded into using an old LTS kernel, is worse than using the latest kernel :) > ("net: dev: introduce support for sch BYPASS for lockless qdisc"). > I can't find any working solution other than a plain revert. > > This will introduce some minor performance regressions for > pfifo_fast qdisc. I plan to address them in net-next with more > indirect call wrapper boilerplate for qdiscs. > > Reported-by: Ahmad Fatoum > Fixes: ba27b4cdaaa6 ("net: dev: introduce support for sch BYPASS for > lockless qdisc") > Signed-off-by: Paolo Abeni > Signed-off-by: David S. Miller > Signed-off-by: Greg Kroah-Hartman > > net/core/dev.c | 22 ++ > 1 file changed, 2 insertions(+), 20 deletions(-) Marc -- Pengutronix e.K. | Marc Kleine-Budde | Embedded Linux | https://www.pengutronix.de | Vertretung West/Dortmund | Phone: +49-231-2826-924 | Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917- | signature.asc Description: OpenPGP digital signature
Re: canfdtest on flexcan loopback
On Thu, Sep 17, 2020 at 03:10:28PM +0200, Marc Kleine-Budde wrote: > On 9/17/20 2:59 PM, Vladimir Oltean wrote: > > On Wed, Sep 16, 2020 at 03:14:37PM +0300, Vladimir Oltean wrote: > >> Nonetheless, you bring up a good point. I'll try to bring into net-next > >> the minimum amount of required delta (which seems to be the > >> fsl_lx2160a_r1_devtype_data structure only), and I'll re-test. > > > > So I'm back with some interesting results. > > > > Test 1: > > NXP LSDK 20.04 based on v5.4.3 - reordering reproduces instantly > > > > Test 2: > > net-next 5.9-rc3 with upstream defconfig and all downstream patches > > imported from LSDK - hasn't reproduced in 20 hours of testing > > > > Test 3: > > net-next 5.9-rc3 with .config imported from LSDK - hasn't reproduced in > > 335 iterations > > > > Test 4: > > linux stable linux-5.4.y - hasn't reproduced in 137 iterations > > > > Test 5: > > torvalds v5.4.3 - reproduces instantly > > Is there a bug in v5.4.3? v5.4.66 is current latest v5.4.x > > Marc > > -- > Pengutronix e.K. | Marc Kleine-Budde | > Embedded Linux | https://www.pengutronix.de | > Vertretung West/Dortmund | Phone: +49-231-2826-924 | > Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917- | > 63d5320a0c9b9867628a3a5a12e7f11d4cc109c2 is the first fixed commit commit 63d5320a0c9b9867628a3a5a12e7f11d4cc109c2 Author: Paolo Abeni Date: Tue Feb 18 18:15:44 2020 +0100 Revert "net: dev: introduce support for sch BYPASS for lockless qdisc" [ Upstream commit 379349e9bc3b42b8b2f8f7a03f64a97623fff323 ] This reverts commit ba27b4cdaaa66561aaedb2101876e563738d36fe Ahmed reported ouf-of-order issues bisected to commit ba27b4cdaaa6 ("net: dev: introduce support for sch BYPASS for lockless qdisc"). I can't find any working solution other than a plain revert. This will introduce some minor performance regressions for pfifo_fast qdisc. I plan to address them in net-next with more indirect call wrapper boilerplate for qdiscs. Reported-by: Ahmad Fatoum Fixes: ba27b4cdaaa6 ("net: dev: introduce support for sch BYPASS for lockless qdisc") Signed-off-by: Paolo Abeni Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman net/core/dev.c | 22 ++ 1 file changed, 2 insertions(+), 20 deletions(-)
Re: canfdtest on flexcan loopback
On 9/17/20 2:59 PM, Vladimir Oltean wrote: > On Wed, Sep 16, 2020 at 03:14:37PM +0300, Vladimir Oltean wrote: >> Nonetheless, you bring up a good point. I'll try to bring into net-next >> the minimum amount of required delta (which seems to be the >> fsl_lx2160a_r1_devtype_data structure only), and I'll re-test. > > So I'm back with some interesting results. > > Test 1: > NXP LSDK 20.04 based on v5.4.3 - reordering reproduces instantly > > Test 2: > net-next 5.9-rc3 with upstream defconfig and all downstream patches > imported from LSDK - hasn't reproduced in 20 hours of testing > > Test 3: > net-next 5.9-rc3 with .config imported from LSDK - hasn't reproduced in > 335 iterations > > Test 4: > linux stable linux-5.4.y - hasn't reproduced in 137 iterations > > Test 5: > torvalds v5.4.3 - reproduces instantly Is there a bug in v5.4.3? v5.4.66 is current latest v5.4.x Marc -- Pengutronix e.K. | Marc Kleine-Budde | Embedded Linux | https://www.pengutronix.de | Vertretung West/Dortmund | Phone: +49-231-2826-924 | Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917- | signature.asc Description: OpenPGP digital signature
Re: canfdtest on flexcan loopback
On Wed, Sep 16, 2020 at 03:14:37PM +0300, Vladimir Oltean wrote: > Nonetheless, you bring up a good point. I'll try to bring into net-next > the minimum amount of required delta (which seems to be the > fsl_lx2160a_r1_devtype_data structure only), and I'll re-test. So I'm back with some interesting results. Test 1: NXP LSDK 20.04 based on v5.4.3 - reordering reproduces instantly Test 2: net-next 5.9-rc3 with upstream defconfig and all downstream patches imported from LSDK - hasn't reproduced in 20 hours of testing Test 3: net-next 5.9-rc3 with .config imported from LSDK - hasn't reproduced in 335 iterations Test 4: linux stable linux-5.4.y - hasn't reproduced in 137 iterations Test 5: torvalds v5.4.3 - reproduces instantly Thanks, -Vladimir
RE: canfdtest on flexcan loopback
> -Original Message- > From: Marc Kleine-Budde > Sent: 2020年9月16日 20:01 > To: Vladimir Oltean > Cc: w...@grandegger.com; Pankaj Bansal ; Pankaj > Bansal (OSS) ; linux-...@vger.kernel.org; > Joakim Zhang ; linux-kernel@vger.kernel.org; > Vladimir Oltean > Subject: Re: canfdtest on flexcan loopback > > On 9/16/20 1:45 PM, Vladimir Oltean wrote: > > On Wed, Sep 16, 2020 at 01:32:49PM +0200, Marc Kleine-Budde wrote: > >> Which driver are you using? The mainline driver only uses one TX buffer. > > > > Are there multiple flexcan drivers in circulation? Yes, the mainline > > driver with a single priv->tx_mb. > > I assume nxp has several patches on their kernels. Are you using the mainline > kernel or the one that's provided by nxp? Hi Marc, Vladimir, Yes, Vladimir uses kernel provided by NXP, I also help try to look into this issue, but it can't be reproduced on i.MX platforms. Our local flexcan driver is almost cherry-picked from linux-can-next/flexcan branch to implement CAN FD mode, which is a version that cleaned up by you before. I confirm that we still use single TX mailbox to send frames, per my understanding, reorder should not occur here. According to Vladimir's description, exactly it happens: "I have added trace points to the end of the flexcan_start_xmit function, which print the entire skb, and the frames appear to be written to the TX message buffer in the correct order. They are seen, however, in the incorrect order on the wire." Since Vladimir only test Classic mode, he can turn to 5.4 upstream kernel to see if this reorder issue also can be reproduced. @Vladimir Oltean, could you please have a try? To easy the test, I think you only need replace below several files at local side, then use "fsl,ls1021ar2-flexcan" compatible string in dts. https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/drivers/net/can/flexcan.c?h=v5.4.65 https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/drivers/net/can/rx-offload.c?h=v5.4.65 https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/include/linux/can/rx-offload.h?h=v5.4.65 If it can't work, suggest to use 5.4 upstream kernel to reproduce this issue. Best Regards, Joakim Zhang > Marc > > -- > Pengutronix e.K. | Marc Kleine-Budde | > Embedded Linux | https://www.pengutronix.de | > Vertretung West/Dortmund | Phone: +49-231-2826-924 | > Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917- |
Re: canfdtest on flexcan loopback
On Wed 9/2/2020 10:09 AM, Wolfgang Grandegger wrote: > canfdtest normally runs on the DUT *and* a the host. The DUT receives > the messages from the host, increments the frame data bytes and then > sends them back to the host. With "loopback" mode, the data bytes are > not incremented and that's what you see above. > > Wolfgang Wolfgang is of course right, but we're nonetheless investigating what seems to be a real problem, and what Pankaj had seen was a red herring. So currently what I suspect is going on, when I am running canfdtest between 2 LS1028A-RDB boards, is that the DUT is reordering frames on TX. See, for example, the screenshot below: https://drive.google.com/file/d/1rOeW3aXh3kPh1CJ39lCccRfjFz5JN5I6/view?usp=sharing I have added trace points to the end of the flexcan_start_xmit function, which print the entire skb, and the frames appear to be written to the TX message buffer in the correct order. They are seen, however, in the incorrect order on the wire. Thanks, -Vladimir
Re: canfdtest on flexcan loopback
On 9/16/20 1:04 PM, Vladimir Oltean wrote: > [ resending, forgot to copy Wolfgang ] > > On Wed 9/2/2020 10:09 AM, Wolfgang Grandegger wrote: >> canfdtest normally runs on the DUT *and* a the host. The DUT receives >> the messages from the host, increments the frame data bytes and then >> sends them back to the host. With "loopback" mode, the data bytes are >> not incremented and that's what you see above. >> >> Wolfgang > > Wolfgang is of course right, but we're nonetheless investigating what > seems to be a real problem, and what Pankaj had seen was a red herring. > > So currently what I suspect is going on, when I am running canfdtest > between 2 LS1028A-RDB boards, is that the DUT is reordering frames on > TX. > > See, for example, the screenshot below: > https://drive.google.com/file/d/1rOeW3aXh3kPh1CJ39lCccRfjFz5JN5I6/view?usp=sharing > > I have added trace points to the end of the flexcan_start_xmit function, > which print the entire skb, and the frames appear to be written to the > TX message buffer in the correct order. They are seen, however, in the > incorrect order on the wire. Which driver are you using? The mainline driver only uses one TX buffer. Marc -- Pengutronix e.K. | Marc Kleine-Budde | Embedded Linux | https://www.pengutronix.de | Vertretung West/Dortmund | Phone: +49-231-2826-924 | Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917- | signature.asc Description: OpenPGP digital signature
Re: canfdtest on flexcan loopback
On Wed, Sep 16, 2020 at 01:32:49PM +0200, Marc Kleine-Budde wrote: > Which driver are you using? The mainline driver only uses one TX buffer. Are there multiple flexcan drivers in circulation? Yes, the mainline driver with a single priv->tx_mb. Thanks, -Vladimir
Re: canfdtest on flexcan loopback
[ resending, forgot to copy Wolfgang ] On Wed 9/2/2020 10:09 AM, Wolfgang Grandegger wrote: > canfdtest normally runs on the DUT *and* a the host. The DUT receives > the messages from the host, increments the frame data bytes and then > sends them back to the host. With "loopback" mode, the data bytes are > not incremented and that's what you see above. > > Wolfgang Wolfgang is of course right, but we're nonetheless investigating what seems to be a real problem, and what Pankaj had seen was a red herring. So currently what I suspect is going on, when I am running canfdtest between 2 LS1028A-RDB boards, is that the DUT is reordering frames on TX. See, for example, the screenshot below: https://drive.google.com/file/d/1rOeW3aXh3kPh1CJ39lCccRfjFz5JN5I6/view?usp=sharing I have added trace points to the end of the flexcan_start_xmit function, which print the entire skb, and the frames appear to be written to the TX message buffer in the correct order. They are seen, however, in the incorrect order on the wire. Thanks, -Vladimir
Re: canfdtest on flexcan loopback
On Wed, Sep 16, 2020 at 02:01:11PM +0200, Marc Kleine-Budde wrote: > On 9/16/20 1:45 PM, Vladimir Oltean wrote: > > On Wed, Sep 16, 2020 at 01:32:49PM +0200, Marc Kleine-Budde wrote: > >> Which driver are you using? The mainline driver only uses one TX buffer. > > > > Are there multiple flexcan drivers in circulation? Yes, the mainline > > driver with a single priv->tx_mb. > > I assume nxp has several patches on their kernels. Are you using the mainline > kernel or the one that's provided by nxp? Ah, ok, that's what you mean. So, yes, I'm using an NXP-provided kernel, just because the NXP maintainers didn't bother to submit the SoC support upstream, apparently. Here's the diff to net-next, some things were added by me for debugging: flexcan_dump_regs() and trace_canfd_frame(). I am using the fsl_lx2160a_r1_devtype_data structure, looking at the delta it seems to me like most of the additions (CAN FD) should be bypassed because I have commented out FLEXCAN_QUIRK_USE_OFF_TIMESTAMP | FLEXCAN_QUIRK_TIMESTAMP_SUPPORT_FD. Nonetheless, you bring up a good point. I'll try to bring into net-next the minimum amount of required delta (which seems to be the fsl_lx2160a_r1_devtype_data structure only), and I'll re-test. --- net-next/drivers/net/can/flexcan.c 2020-05-20 14:15:30.365471681 +0300 +++ qoriq-linux/drivers/net/can/flexcan.c 2020-09-16 12:16:49.362992279 +0300 @@ -6,6 +6,7 @@ // Copyright (c) 2009 Sascha Hauer, Pengutronix // Copyright (c) 2010-2017 Pengutronix, Marc Kleine-Budde // Copyright (c) 2014 David Jander, Protonic Holland +// Copyright 2015, 2018 NXP // // Based on code originally by Andrey Volkov @@ -26,7 +27,14 @@ #include #include #include +#include #include +#include + +#ifdef CONFIG_IMX_SCU_SOC +#include +#include +#endif #define DRV_NAME "flexcan" @@ -52,6 +60,7 @@ #define FLEXCAN_MCR_IRMQ BIT(16) #define FLEXCAN_MCR_LPRIO_EN BIT(13) #define FLEXCAN_MCR_AENBIT(12) +#define FLEXCAN_MCR_FDEN BIT(11) /* MCR_MAXMB: maximum used MBs is MAXMB + 1 */ #define FLEXCAN_MCR_MAXMB(x) ((x) & 0x7f) #define FLEXCAN_MCR_IDAM_A (0x0 << 8) @@ -91,6 +100,7 @@ #define FLEXCAN_CTRL2_MRP BIT(18) #define FLEXCAN_CTRL2_RRS BIT(17) #define FLEXCAN_CTRL2_EACENBIT(16) +#define FLEXCAN_CTRL2_ISOCANFDEN BIT(12) /* FLEXCAN memory error control register (MECR) bits */ #define FLEXCAN_MECR_ECRWRDIS BIT(31) @@ -134,8 +144,30 @@ (FLEXCAN_ESR_ERR_BUS | FLEXCAN_ESR_ERR_STATE) #define FLEXCAN_ESR_ALL_INT \ (FLEXCAN_ESR_TWRN_INT | FLEXCAN_ESR_RWRN_INT | \ -FLEXCAN_ESR_BOFF_INT | FLEXCAN_ESR_ERR_INT | \ -FLEXCAN_ESR_WAK_INT) +FLEXCAN_ESR_BOFF_INT | FLEXCAN_ESR_ERR_INT) + +/* FLEXCAN Bit Timing register (CBT) bits */ +#define FLEXCAN_CBT_BTFBIT(31) +#define FLEXCAN_CBT_EPRESDIV(x)(((x) & 0x3ff) << 21) +#define FLEXCAN_CBT_ERJW(x)(((x) & 0x0f) << 16) +#define FLEXCAN_CBT_EPROPSEG(x)(((x) & 0x3f) << 10) +#define FLEXCAN_CBT_EPSEG1(x) (((x) & 0x1f) << 5) +#define FLEXCAN_CBT_EPSEG2(x) ((x) & 0x1f) + +/* FLEXCAN FD control register (FDCTRL) bits */ +#define FLEXCAN_FDCTRL_FDRATE BIT(31) +#define FLEXCAN_FDCTRL_TDCEN BIT(15) +#define FLEXCAN_FDCTRL_TDCFAIL BIT(14) +#define FLEXCAN_FDCTRL_MBDSR1(x) (((x) & 0x3) << 19) +#define FLEXCAN_FDCTRL_MBDSR0(x) (((x) & 0x3) << 16) +#define FLEXCAN_FDCTRL_TDCOFF(x) (((x) & 0x1f) << 8) + +/* FLEXCAN FD Bit Timing register (FDCBT) bits */ +#define FLEXCAN_FDCBT_FPRESDIV(x) (((x) & 0x3ff) << 20) +#define FLEXCAN_FDCBT_FRJW(x) (((x) & 0x07) << 16) +#define FLEXCAN_FDCBT_FPROPSEG(x) (((x) & 0x1f) << 10) +#define FLEXCAN_FDCBT_FPSEG1(x)(((x) & 0x07) << 5) +#define FLEXCAN_FDCBT_FPSEG2(x)((x) & 0x07) /* FLEXCAN interrupt flag register (IFLAG) bits */ /* Errata ERR005829 step7: Reserve first valid MB */ @@ -161,6 +193,9 @@ #define FLEXCAN_MB_CODE_TX_DATA(0xc << 24) #define FLEXCAN_MB_CODE_TX_TANSWER (0xe << 24) +#define FLEXCAN_MB_CNT_EDL BIT(31) +#define FLEXCAN_MB_CNT_BRS BIT(30) +#define FLEXCAN_MB_CNT_ESI BIT(29) #define FLEXCAN_MB_CNT_SRR BIT(22) #define FLEXCAN_MB_CNT_IDE BIT(21) #define FLEXCAN_MB_CNT_RTR BIT(20) @@ -172,15 +207,17 @@ /* FLEXCAN hardware feature flags * * Below is some version info we got: - *SOC Version IP-Version Glitch- [TR]WRN_INT IRQ Err Memory err RTR re- - *Filter? connected? Passive detection ception in MB - * MX25 FlexCAN2 03.00.00.00 nonono nono - * MX28 FlexCAN2 03.00.04.00yes yesno nono - * MX35 FlexCAN2 03.00.00.00
Re: canfdtest on flexcan loopback
On 9/16/20 1:45 PM, Vladimir Oltean wrote: > On Wed, Sep 16, 2020 at 01:32:49PM +0200, Marc Kleine-Budde wrote: >> Which driver are you using? The mainline driver only uses one TX buffer. > > Are there multiple flexcan drivers in circulation? Yes, the mainline > driver with a single priv->tx_mb. I assume nxp has several patches on their kernels. Are you using the mainline kernel or the one that's provided by nxp? Marc -- Pengutronix e.K. | Marc Kleine-Budde | Embedded Linux | https://www.pengutronix.de | Vertretung West/Dortmund | Phone: +49-231-2826-924 | Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917- | signature.asc Description: OpenPGP digital signature