Re: fsl_espi errors on v5.7.15

2020-09-07 Thread Joakim Tjernlund
[SNIP]
> > 

> > > Would you be able to ftrace the interrupt handler function and see if you
> > > can see a difference in number or timing of interrupts? I'm at a bit of
> > > a loss.
> > 
> > I tried ftrace but I really wasn't sure what I was looking for.
> > Capturing a "bad" case was pretty tricky. But I think I've identified a
> > fix (I'll send it as a proper patch shortly). The gist is
> > 
> > diff --git a/drivers/spi/spi-fsl-espi.c b/drivers/spi/spi-fsl-espi.c
> > index 7e7c92cafdbb..cb120b68c0e2 100644
> > --- a/drivers/spi/spi-fsl-espi.c
> > +++ b/drivers/spi/spi-fsl-espi.c
> > @@ -574,13 +574,14 @@ static void fsl_espi_cpu_irq(struct fsl_espi
> > *espi, u32 events)
> >   static irqreturn_t fsl_espi_irq(s32 irq, void *context_data)
> >   {
> >  struct fsl_espi *espi = context_data;
> > -   u32 events;
> > +   u32 events, mask;
> > 
> >  spin_lock(>lock);
> > 
> >  /* Get interrupt events(tx/rx) */
> >  events = fsl_espi_read_reg(espi, ESPI_SPIE);
> > -   if (!events) {
> > +   mask = fsl_espi_read_reg(espi, ESPI_SPIM);
> > +   if (!(events & mask)) {
> >  spin_unlock(>lock);
> >  return IRQ_NONE;
> >  }
> > 
> > The SPIE register contains the TXCNT so events is pretty much always
> > going to have something set. By checking events against what we've
> > actually requested interrupts for we don't see any spurious events.
> > 
> > I've tested this on the T2080RDB and on our custom hardware and it seems
> > to resolve the problem.
> > 
> 
> I looked at the fsl_espi_irq() too and noticed that clearing of the IRQ events
> are after processing TX/RX. That looks a bit odd to me.

I should have been more specific. I think you can loose IRQs as fsl_espi_irq() 
works now.
Consider this:
1) You get TX IRQ and enter fsl_espi_irq()
2) Enter fsl_espi_fill_tx_fifo() to process any chars until done.
3) Now you get one more TX IRQ
4) fsl_espi_irq() clear events -> IRQ from 3) is lost.

 Jocke


Re: fsl_espi errors on v5.7.15

2020-09-07 Thread Joakim Tjernlund
On Thu, 2020-09-03 at 23:58 +, Chris Packham wrote:
> CAUTION: This email originated from outside of the organization. Do not click 
> links or open attachments unless you recognize the sender and know the 
> content is safe.
> 
> 
> On 1/09/20 6:14 pm, Nicholas Piggin wrote:
> > Excerpts from Chris Packham's message of September 1, 2020 11:25 am:
> > > On 1/09/20 12:33 am, Heiner Kallweit wrote:
> > > > On 30.08.2020 23:59, Chris Packham wrote:
> > > > > On 31/08/20 9:41 am, Heiner Kallweit wrote:
> > > > > > On 30.08.2020 23:00, Chris Packham wrote:
> > > > > > > On 31/08/20 12:30 am, Nicholas Piggin wrote:
> > > > > > > > Excerpts from Chris Packham's message of August 28, 2020 8:07 
> > > > > > > > am:
> > > > > > > 
> > > > > > > 
> > > > > > > > > > > > > I've also now seen the RX FIFO not empty error on the 
> > > > > > > > > > > > > T2080RDB
> > > > > > > > > > > > > 
> > > > > > > > > > > > > fsl_espi ffe11.spi: Transfer done but SPIE_DON 
> > > > > > > > > > > > > isn't set!
> > > > > > > > > > > > > fsl_espi ffe11.spi: Transfer done but SPIE_DON 
> > > > > > > > > > > > > isn't set!
> > > > > > > > > > > > > fsl_espi ffe11.spi: Transfer done but SPIE_DON 
> > > > > > > > > > > > > isn't set!
> > > > > > > > > > > > > fsl_espi ffe11.spi: Transfer done but SPIE_DON 
> > > > > > > > > > > > > isn't set!
> > > > > > > > > > > > > fsl_espi ffe11.spi: Transfer done but rx/tx 
> > > > > > > > > > > > > fifo's aren't empty!
> > > > > > > > > > > > > fsl_espi ffe11.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 
> > > > > > > > > > > > > 32
> > > > > > > > > > > > > 
> > > > > > > > > > > > > With my current workaround of emptying the RX FIFO. 
> > > > > > > > > > > > > It seems
> > > > > > > > > > > > > survivable. Interestingly it only ever seems to be 1 
> > > > > > > > > > > > > extra byte in the
> > > > > > > > > > > > > RX FIFO and it seems to be after either a READ_SR or 
> > > > > > > > > > > > > a READ_FSR.
> > > > > > > > > > > > > 
> > > > > > > > > > > > > fsl_espi ffe11.spi: tx 70
> > > > > > > > > > > > > fsl_espi ffe11.spi: rx 03
> > > > > > > > > > > > > fsl_espi ffe11.spi: Extra RX 00
> > > > > > > > > > > > > fsl_espi ffe11.spi: Transfer done but SPIE_DON 
> > > > > > > > > > > > > isn't set!
> > > > > > > > > > > > > fsl_espi ffe11.spi: Transfer done but rx/tx 
> > > > > > > > > > > > > fifo's aren't empty!
> > > > > > > > > > > > > fsl_espi ffe11.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 
> > > > > > > > > > > > > 32
> > > > > > > > > > > > > fsl_espi ffe11.spi: tx 05
> > > > > > > > > > > > > fsl_espi ffe11.spi: rx 00
> > > > > > > > > > > > > fsl_espi ffe11.spi: Extra RX 03
> > > > > > > > > > > > > fsl_espi ffe11.spi: Transfer done but SPIE_DON 
> > > > > > > > > > > > > isn't set!
> > > > > > > > > > > > > fsl_espi ffe11.spi: Transfer done but rx/tx 
> > > > > > > > > > > > > fifo's aren't empty!
> > > > > > > > > > > > > fsl_espi ffe11.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 
> > > > > > > > > > > > > 32
> > > > > > > > > > > > > fsl_espi ffe11.spi: tx 05
> > > > > > > > > > > > > fsl_espi ffe11.spi: rx 00
> > > > > > > > > > > > > fsl_espi ffe11.spi: Extra RX 03
> > > > > > > > > > > > > 
> > > > > > > > > > > > >  From all the Micron SPI-NOR datasheets I've got 
> > > > > > > > > > > > > access to it is
> > > > > > > > > > > > > possible to continually read the SR/FSR. But I've no 
> > > > > > > > > > > > > idea why it
> > > > > > > > > > > > > happens some times and not others.
> > > > > > > > > > > > So I think I've got a reproduction and I think I've 
> > > > > > > > > > > > bisected the problem
> > > > > > > > > > > > to commit 3282a3da25bd ("powerpc/64: Implement soft 
> > > > > > > > > > > > interrupt replay in
> > > > > > > > > > > > C"). My day is just finishing now so I haven't applied 
> > > > > > > > > > > > too much scrutiny
> > > > > > > > > > > > to this result. Given the various rabbit holes I've 
> > > > > > > > > > > > been down on this
> > > > > > > > > > > > issue already I'd take this information with a good 
> > > > > > > > > > > > degree of skepticism.
> > > > > > > > > > > > 
> > > > > > > > > > > OK, so an easy test should be to re-test with a 5.4 
> > > > > > > > > > > kernel.
> > > > > > > > > > > It doesn't have yet the change you're referring to, and 
> > > > > > > > > > > the fsl-espi driver
> > > > > > > > > > > is basically the same as in 5.7 (just two small changes 
> > > > > > > > > > > in 5.7).
> > > > > > > > > > There's 6cc0c16d82f88 and maybe also other interrupt 
> > > > > > > > > > related patches
> > > > > > > > > > around this time that could affect book E, so it's good if 
> > > > > > > > > > that exact
> > > > > > > > > > patch is confirmed.
> > > > > > > > > My confirmation is basically that I can induce the issue in a 
> > > > > > > > > 5.4 kernel
> > > > > > > > > by cherry-picking 3282a3da25bd. I'm also able to "fix" the 
> > > > > > > > > issue in
> > > > > > > > > 

Re: fsl_espi errors on v5.7.15

2020-09-06 Thread Chris Packham

On 5/09/20 5:23 am, Heiner Kallweit wrote:


On Fri 4. Sep 2020 at 01:58, Chris Packham 
mailto:chris.pack...@alliedtelesis.co.nz>> 
wrote:


On 1/09/20 6:14 pm, Nicholas Piggin wrote:

> Excerpts from Chris Packham's message of September 1, 2020 11:25 am:

>> On 1/09/20 12:33 am, Heiner Kallweit wrote:

>>> On 30.08.2020 23:59, Chris Packham wrote:

 On 31/08/20 9:41 am, Heiner Kallweit wrote:

> On 30.08.2020 23:00, Chris Packham wrote:

>> On 31/08/20 12:30 am, Nicholas Piggin wrote:

>>> Excerpts from Chris Packham's message of August 28, 2020 8:07 am:

>> 

>>

 I've also now seen the RX FIFO not empty error on the T2080RDB



 fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!

 fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!

 fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!

 fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!

 fsl_espi ffe11.spi: Transfer done but rx/tx fifo's aren't 
 empty!

 fsl_espi ffe11.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32



 With my current workaround of emptying the RX FIFO. It seems

 survivable. Interestingly it only ever seems to be 1 extra byte in 
 the

 RX FIFO and it seems to be after either a READ_SR or a READ_FSR.



 fsl_espi ffe11.spi: tx 70

 fsl_espi ffe11.spi: rx 03

 fsl_espi ffe11.spi: Extra RX 00

 fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!

 fsl_espi ffe11.spi: Transfer done but rx/tx fifo's aren't 
 empty!

 fsl_espi ffe11.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32

 fsl_espi ffe11.spi: tx 05

 fsl_espi ffe11.spi: rx 00

 fsl_espi ffe11.spi: Extra RX 03

 fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!

 fsl_espi ffe11.spi: Transfer done but rx/tx fifo's aren't 
 empty!

 fsl_espi ffe11.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32

 fsl_espi ffe11.spi: tx 05

 fsl_espi ffe11.spi: rx 00

 fsl_espi ffe11.spi: Extra RX 03



  From all the Micron SPI-NOR datasheets I've got access to it 
 is

 possible to continually read the SR/FSR. But I've no idea why it

 happens some times and not others.

>>> So I think I've got a reproduction and I think I've bisected the 
>>> problem

>>> to commit 3282a3da25bd ("powerpc/64: Implement soft interrupt 
>>> replay in

>>> C"). My day is just finishing now so I haven't applied too much 
>>> scrutiny

>>> to this result. Given the various rabbit holes I've been down on 
>>> this

>>> issue already I'd take this information with a good degree of 
>>> skepticism.

>>>

>> OK, so an easy test should be to re-test with a 5.4 kernel.

>> It doesn't have yet the change you're referring to, and the fsl-espi 
>> driver

>> is basically the same as in 5.7 (just two small changes in 5.7).

> There's 6cc0c16d82f88 and maybe also other interrupt related patches

> around this time that could affect book E, so it's good if that exact

> patch is confirmed.

 My confirmation is basically that I can induce the issue in a 5.4 
 kernel

 by cherry-picking 3282a3da25bd. I'm also able to "fix" the issue in

 5.9-rc2 by reverting that one commit.



 I both cases it's not exactly a clean cherry-pick/revert so I also

 confirmed the bisection result by building at 3282a3da25bd (which sees

 the issue) and the commit just before (which does not).

>>> Thanks for testing, that confirms it well.

>>>

>>> [snip patch]

>>>

 I still saw the issue with this change applied. PPC_IRQ_SOFT_MASK_DEBUG

 didn't report anything (either with or without the change above).

>>> Okay, it was a bit of a shot in the dark. I still can't see what

>>> else has changed.

>>>

>>> What would cause this, a lost interrupt? A spurious interrupt? Or

>>> higher interrupt latency?

>>>

>>> I don't think the patch should cause significantly worse latency,

>>> (it's supposed to be a bit better if anything because it doesn't set

>>> up the full interrupt frame). But it's possible.

>> My working theory is that the SPI_DON indication is all about the TX

>> direction an now that the interrupts are faster we're hitting an error

>> because there is still RX activity going on. Heiner disagrees with my

>> interpretation of the SPI_DON indication and the fact that it doesn't

>> happen 

Re: fsl_espi errors on v5.7.15

2020-09-06 Thread Chris Packham
(resend as something decided to insert html, some context stripped)

On 5/09/20 5:23 am, Heiner Kallweit wrote:
> On Fri 4. Sep 2020 at 01:58, Chris Packham 
>  > wrote:
>
>

>
>
> I tried ftrace but I really wasn't sure what I was looking for.
>
> Capturing a "bad" case was pretty tricky. But I think I've
> identified a
>
> fix (I'll send it as a proper patch shortly). The gist is
>
>
>
> diff --git a/drivers/spi/spi-fsl-espi.c b/drivers/spi/spi-fsl-espi.c
>
> index 7e7c92cafdbb..cb120b68c0e2 100644
>
> --- a/drivers/spi/spi-fsl-espi.c
>
> +++ b/drivers/spi/spi-fsl-espi.c
>
> @@ -574,13 +574,14 @@ static void fsl_espi_cpu_irq(struct fsl_espi
>
> *espi, u32 events)
>
>   static irqreturn_t fsl_espi_irq(s32 irq, void *context_data)
>
>   {
>
>  struct fsl_espi *espi = context_data;
>
> -   u32 events;
>
> +   u32 events, mask;
>
>
>
>  spin_lock(>lock);
>
>
>
>  /* Get interrupt events(tx/rx) */
>
>  events = fsl_espi_read_reg(espi, ESPI_SPIE);
>
> -   if (!events) {
>
> +   mask = fsl_espi_read_reg(espi, ESPI_SPIM);
>
> +   if (!(events & mask)) {
>
>  spin_unlock(>lock);
>
>  return IRQ_NONE;
>
>  }
>
>
>
> The SPIE register contains the TXCNT so events is pretty much always
>
> going to have something set. By checking events against what we've
>
> actually requested interrupts for we don't see any spurious events.
>
>
> Usually we shouldn’t receive interrupts we’re not interested in, 
> except the interrupt is shared.
My theory is that we get an interrupt to the core for RXT and another 
for DON. With the changes to the powerpc interrupt handling and the fact 
that fsl_espi_cpu_irq() doesn't actually look at the specific event bits 
means that sometimes both events are handled in the processing of the 
first interrupt and the second one trips the SPI_DON not set error.

There's an old comment "SPI bus sometimes got lost interrupts..." which 
makes me wonder if the interrupt handling change has fixed this original 
problem.

I still think the change makes sense regardless because the SPIE 
register has some counter fields in it.

> This leads to the question: is the SPI interrupt shared with another 
> device on your system?
It's not shared on either the custom board or the T2080RDB.
> Do you see spurious interrupts with the patch under 
> /proc/irq/(irq)/spurious?

Yes it looks like it

[root@linuxbox ~]# cat /proc/irq/53/spurious
count 3126
unhandled 0
last_unhandled 0 ms

[root@linuxbox ~]# /flash/dd_test.sh

[root@linuxbox ~]# cat /proc/irq/53/spurious
count 1016
unhandled 0
last_unhandled 4294746100 ms

[root@linuxbox ~]# /flash/dd_test.sh

[root@linuxbox ~]# cat /proc/irq/53/spurious
count 88391
unhandled 0
last_unhandled 4294746100 ms

[root@linuxbox ~]# /flash/dd_test.sh

[root@linuxbox ~]# cat /proc/irq/53/spurious
count 72459
unhandled 2
last_unhandled 4294758632 ms
>
>
>
> I've tested this on the T2080RDB and on our custom hardware and it
> seems
>
> to resolve the problem.
>
>
>

Re: fsl_espi errors on v5.7.15

2020-09-05 Thread Heiner Kallweit
On Fri 4. Sep 2020 at 01:58, Chris Packham <
chris.pack...@alliedtelesis.co.nz> wrote:

>
>
> On 1/09/20 6:14 pm, Nicholas Piggin wrote:
>
> > Excerpts from Chris Packham's message of September 1, 2020 11:25 am:
>
> >> On 1/09/20 12:33 am, Heiner Kallweit wrote:
>
> >>> On 30.08.2020 23:59, Chris Packham wrote:
>
>  On 31/08/20 9:41 am, Heiner Kallweit wrote:
>
> > On 30.08.2020 23:00, Chris Packham wrote:
>
> >> On 31/08/20 12:30 am, Nicholas Piggin wrote:
>
> >>> Excerpts from Chris Packham's message of August 28, 2020 8:07 am:
>
> >> 
>
> >>
>
>  I've also now seen the RX FIFO not empty error on the T2080RDB
>
> 
>
>  fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
>
>  fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
>
>  fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
>
>  fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
>
>  fsl_espi ffe11.spi: Transfer done but rx/tx fifo's aren't
> empty!
>
>  fsl_espi ffe11.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32
>
> 
>
>  With my current workaround of emptying the RX FIFO. It seems
>
>  survivable. Interestingly it only ever seems to be 1 extra
> byte in the
>
>  RX FIFO and it seems to be after either a READ_SR or a
> READ_FSR.
>
> 
>
>  fsl_espi ffe11.spi: tx 70
>
>  fsl_espi ffe11.spi: rx 03
>
>  fsl_espi ffe11.spi: Extra RX 00
>
>  fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
>
>  fsl_espi ffe11.spi: Transfer done but rx/tx fifo's aren't
> empty!
>
>  fsl_espi ffe11.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32
>
>  fsl_espi ffe11.spi: tx 05
>
>  fsl_espi ffe11.spi: rx 00
>
>  fsl_espi ffe11.spi: Extra RX 03
>
>  fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
>
>  fsl_espi ffe11.spi: Transfer done but rx/tx fifo's aren't
> empty!
>
>  fsl_espi ffe11.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32
>
>  fsl_espi ffe11.spi: tx 05
>
>  fsl_espi ffe11.spi: rx 00
>
>  fsl_espi ffe11.spi: Extra RX 03
>
> 
>
>   From all the Micron SPI-NOR datasheets I've got access
> to it is
>
>  possible to continually read the SR/FSR. But I've no idea why
> it
>
>  happens some times and not others.
>
> >>> So I think I've got a reproduction and I think I've bisected
> the problem
>
> >>> to commit 3282a3da25bd ("powerpc/64: Implement soft interrupt
> replay in
>
> >>> C"). My day is just finishing now so I haven't applied too
> much scrutiny
>
> >>> to this result. Given the various rabbit holes I've been down
> on this
>
> >>> issue already I'd take this information with a good degree of
> skepticism.
>
> >>>
>
> >> OK, so an easy test should be to re-test with a 5.4 kernel.
>
> >> It doesn't have yet the change you're referring to, and the
> fsl-espi driver
>
> >> is basically the same as in 5.7 (just two small changes in 5.7).
>
> > There's 6cc0c16d82f88 and maybe also other interrupt related
> patches
>
> > around this time that could affect book E, so it's good if that
> exact
>
> > patch is confirmed.
>
>  My confirmation is basically that I can induce the issue in a 5.4
> kernel
>
>  by cherry-picking 3282a3da25bd. I'm also able to "fix" the issue
> in
>
>  5.9-rc2 by reverting that one commit.
>
> 
>
>  I both cases it's not exactly a clean cherry-pick/revert so I also
>
>  confirmed the bisection result by building at 3282a3da25bd (which
> sees
>
>  the issue) and the commit just before (which does not).
>
> >>> Thanks for testing, that confirms it well.
>
> >>>
>
> >>> [snip patch]
>
> >>>
>
>  I still saw the issue with this change applied.
> PPC_IRQ_SOFT_MASK_DEBUG
>
>  didn't report anything (either with or without the change above).
>
> >>> Okay, it was a bit of a shot in the dark. I still can't see what
>
> >>> else has changed.
>
> >>>
>
> >>> What would cause this, a lost interrupt? A spurious interrupt? Or
>
> >>> higher interrupt latency?
>
> >>>
>
> >>> I don't think the patch should cause significantly worse latency,
>
> >>> (it's supposed to be a bit better if anything because it doesn't
> set
>
> >>> up the full interrupt frame). But it's possible.
>
> >> My working theory is that the SPI_DON indication is all about the TX
>
> >> direction an now that the interrupts are faster we're hitting an
> error
>
> >> because there is still RX activity going on. Heiner disagrees with
> my
>
> >> 

Re: fsl_espi errors on v5.7.15

2020-09-03 Thread Chris Packham

On 1/09/20 6:14 pm, Nicholas Piggin wrote:
> Excerpts from Chris Packham's message of September 1, 2020 11:25 am:
>> On 1/09/20 12:33 am, Heiner Kallweit wrote:
>>> On 30.08.2020 23:59, Chris Packham wrote:
 On 31/08/20 9:41 am, Heiner Kallweit wrote:
> On 30.08.2020 23:00, Chris Packham wrote:
>> On 31/08/20 12:30 am, Nicholas Piggin wrote:
>>> Excerpts from Chris Packham's message of August 28, 2020 8:07 am:
>> 
>>
 I've also now seen the RX FIFO not empty error on the T2080RDB

 fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
 fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
 fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
 fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
 fsl_espi ffe11.spi: Transfer done but rx/tx fifo's aren't 
 empty!
 fsl_espi ffe11.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32

 With my current workaround of emptying the RX FIFO. It seems
 survivable. Interestingly it only ever seems to be 1 extra byte in 
 the
 RX FIFO and it seems to be after either a READ_SR or a READ_FSR.

 fsl_espi ffe11.spi: tx 70
 fsl_espi ffe11.spi: rx 03
 fsl_espi ffe11.spi: Extra RX 00
 fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
 fsl_espi ffe11.spi: Transfer done but rx/tx fifo's aren't 
 empty!
 fsl_espi ffe11.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32
 fsl_espi ffe11.spi: tx 05
 fsl_espi ffe11.spi: rx 00
 fsl_espi ffe11.spi: Extra RX 03
 fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
 fsl_espi ffe11.spi: Transfer done but rx/tx fifo's aren't 
 empty!
 fsl_espi ffe11.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32
 fsl_espi ffe11.spi: tx 05
 fsl_espi ffe11.spi: rx 00
 fsl_espi ffe11.spi: Extra RX 03

  From all the Micron SPI-NOR datasheets I've got access to it 
 is
 possible to continually read the SR/FSR. But I've no idea why it
 happens some times and not others.
>>> So I think I've got a reproduction and I think I've bisected the 
>>> problem
>>> to commit 3282a3da25bd ("powerpc/64: Implement soft interrupt 
>>> replay in
>>> C"). My day is just finishing now so I haven't applied too much 
>>> scrutiny
>>> to this result. Given the various rabbit holes I've been down on 
>>> this
>>> issue already I'd take this information with a good degree of 
>>> skepticism.
>>>
>> OK, so an easy test should be to re-test with a 5.4 kernel.
>> It doesn't have yet the change you're referring to, and the fsl-espi 
>> driver
>> is basically the same as in 5.7 (just two small changes in 5.7).
> There's 6cc0c16d82f88 and maybe also other interrupt related patches
> around this time that could affect book E, so it's good if that exact
> patch is confirmed.
 My confirmation is basically that I can induce the issue in a 5.4 
 kernel
 by cherry-picking 3282a3da25bd. I'm also able to "fix" the issue in
 5.9-rc2 by reverting that one commit.

 I both cases it's not exactly a clean cherry-pick/revert so I also
 confirmed the bisection result by building at 3282a3da25bd (which sees
 the issue) and the commit just before (which does not).
>>> Thanks for testing, that confirms it well.
>>>
>>> [snip patch]
>>>
 I still saw the issue with this change applied. PPC_IRQ_SOFT_MASK_DEBUG
 didn't report anything (either with or without the change above).
>>> Okay, it was a bit of a shot in the dark. I still can't see what
>>> else has changed.
>>>
>>> What would cause this, a lost interrupt? A spurious interrupt? Or
>>> higher interrupt latency?
>>>
>>> I don't think the patch should cause significantly worse latency,
>>> (it's supposed to be a bit better if anything because it doesn't set
>>> up the full interrupt frame). But it's possible.
>> My working theory is that the SPI_DON indication is all about the TX
>> direction an now that the interrupts are faster we're hitting an error
>> because there is still RX activity going on. Heiner disagrees with my
>> interpretation of the SPI_DON indication and the fact that it doesn't
>> happen every time does throw doubt on it.
>>
> It's right that the eSPI spec can be interpreted that SPI_DON refers to
> TX only. However this wouldn't really make sense, because also for RX
> we program the frame 

Re: fsl_espi errors on v5.7.15

2020-09-01 Thread Chris Packham

On 2/09/20 11:29 am, Chris Packham wrote:
>
> On 1/09/20 6:14 pm, Nicholas Piggin wrote:
>> Excerpts from Chris Packham's message of September 1, 2020 11:25 am:
>>> On 1/09/20 12:33 am, Heiner Kallweit wrote:
 On 30.08.2020 23:59, Chris Packham wrote:
> On 31/08/20 9:41 am, Heiner Kallweit wrote:
>> On 30.08.2020 23:00, Chris Packham wrote:
>>> On 31/08/20 12:30 am, Nicholas Piggin wrote:
 Excerpts from Chris Packham's message of August 28, 2020 8:07 am:
>>> 
>>>
> I've also now seen the RX FIFO not empty error on the 
> T2080RDB
>
> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
> fsl_espi ffe11.spi: Transfer done but rx/tx fifo's 
> aren't empty!
> fsl_espi ffe11.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32
>
> With my current workaround of emptying the RX FIFO. It seems
> survivable. Interestingly it only ever seems to be 1 extra 
> byte in the
> RX FIFO and it seems to be after either a READ_SR or a 
> READ_FSR.
>
> fsl_espi ffe11.spi: tx 70
> fsl_espi ffe11.spi: rx 03
> fsl_espi ffe11.spi: Extra RX 00
> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
> fsl_espi ffe11.spi: Transfer done but rx/tx fifo's 
> aren't empty!
> fsl_espi ffe11.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32
> fsl_espi ffe11.spi: tx 05
> fsl_espi ffe11.spi: rx 00
> fsl_espi ffe11.spi: Extra RX 03
> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
> fsl_espi ffe11.spi: Transfer done but rx/tx fifo's 
> aren't empty!
> fsl_espi ffe11.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32
> fsl_espi ffe11.spi: tx 05
> fsl_espi ffe11.spi: rx 00
> fsl_espi ffe11.spi: Extra RX 03
>
>  From all the Micron SPI-NOR datasheets I've got 
> access to it is
> possible to continually read the SR/FSR. But I've no idea 
> why it
> happens some times and not others.
 So I think I've got a reproduction and I think I've 
 bisected the problem
 to commit 3282a3da25bd ("powerpc/64: Implement soft 
 interrupt replay in
 C"). My day is just finishing now so I haven't applied too 
 much scrutiny
 to this result. Given the various rabbit holes I've been 
 down on this
 issue already I'd take this information with a good degree 
 of skepticism.

>>> OK, so an easy test should be to re-test with a 5.4 kernel.
>>> It doesn't have yet the change you're referring to, and the 
>>> fsl-espi driver
>>> is basically the same as in 5.7 (just two small changes in 
>>> 5.7).
>> There's 6cc0c16d82f88 and maybe also other interrupt related 
>> patches
>> around this time that could affect book E, so it's good if 
>> that exact
>> patch is confirmed.
> My confirmation is basically that I can induce the issue in a 
> 5.4 kernel
> by cherry-picking 3282a3da25bd. I'm also able to "fix" the 
> issue in
> 5.9-rc2 by reverting that one commit.
>
> I both cases it's not exactly a clean cherry-pick/revert so I 
> also
> confirmed the bisection result by building at 3282a3da25bd 
> (which sees
> the issue) and the commit just before (which does not).
 Thanks for testing, that confirms it well.

 [snip patch]

> I still saw the issue with this change applied. 
> PPC_IRQ_SOFT_MASK_DEBUG
> didn't report anything (either with or without the change above).
 Okay, it was a bit of a shot in the dark. I still can't see what
 else has changed.

 What would cause this, a lost interrupt? A spurious interrupt? Or
 higher interrupt latency?

 I don't think the patch should cause significantly worse latency,
 (it's supposed to be a bit better if anything because it 
 doesn't set
 up the full interrupt frame). But it's possible.
>>> My working theory is that the SPI_DON indication is all about 
>>> the TX
>>> direction an now that the interrupts are faster we're hitting an 
>>> error
>>> because there is still RX activity going on. Heiner disagrees 
>>> with my
>>> 

Re: fsl_espi errors on v5.7.15

2020-09-01 Thread Chris Packham

On 1/09/20 6:14 pm, Nicholas Piggin wrote:
> Excerpts from Chris Packham's message of September 1, 2020 11:25 am:
>> On 1/09/20 12:33 am, Heiner Kallweit wrote:
>>> On 30.08.2020 23:59, Chris Packham wrote:
 On 31/08/20 9:41 am, Heiner Kallweit wrote:
> On 30.08.2020 23:00, Chris Packham wrote:
>> On 31/08/20 12:30 am, Nicholas Piggin wrote:
>>> Excerpts from Chris Packham's message of August 28, 2020 8:07 am:
>> 
>>
 I've also now seen the RX FIFO not empty error on the T2080RDB

 fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
 fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
 fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
 fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
 fsl_espi ffe11.spi: Transfer done but rx/tx fifo's aren't 
 empty!
 fsl_espi ffe11.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32

 With my current workaround of emptying the RX FIFO. It seems
 survivable. Interestingly it only ever seems to be 1 extra byte in 
 the
 RX FIFO and it seems to be after either a READ_SR or a READ_FSR.

 fsl_espi ffe11.spi: tx 70
 fsl_espi ffe11.spi: rx 03
 fsl_espi ffe11.spi: Extra RX 00
 fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
 fsl_espi ffe11.spi: Transfer done but rx/tx fifo's aren't 
 empty!
 fsl_espi ffe11.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32
 fsl_espi ffe11.spi: tx 05
 fsl_espi ffe11.spi: rx 00
 fsl_espi ffe11.spi: Extra RX 03
 fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
 fsl_espi ffe11.spi: Transfer done but rx/tx fifo's aren't 
 empty!
 fsl_espi ffe11.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32
 fsl_espi ffe11.spi: tx 05
 fsl_espi ffe11.spi: rx 00
 fsl_espi ffe11.spi: Extra RX 03

  From all the Micron SPI-NOR datasheets I've got access to it 
 is
 possible to continually read the SR/FSR. But I've no idea why it
 happens some times and not others.
>>> So I think I've got a reproduction and I think I've bisected the 
>>> problem
>>> to commit 3282a3da25bd ("powerpc/64: Implement soft interrupt 
>>> replay in
>>> C"). My day is just finishing now so I haven't applied too much 
>>> scrutiny
>>> to this result. Given the various rabbit holes I've been down on 
>>> this
>>> issue already I'd take this information with a good degree of 
>>> skepticism.
>>>
>> OK, so an easy test should be to re-test with a 5.4 kernel.
>> It doesn't have yet the change you're referring to, and the fsl-espi 
>> driver
>> is basically the same as in 5.7 (just two small changes in 5.7).
> There's 6cc0c16d82f88 and maybe also other interrupt related patches
> around this time that could affect book E, so it's good if that exact
> patch is confirmed.
 My confirmation is basically that I can induce the issue in a 5.4 
 kernel
 by cherry-picking 3282a3da25bd. I'm also able to "fix" the issue in
 5.9-rc2 by reverting that one commit.

 I both cases it's not exactly a clean cherry-pick/revert so I also
 confirmed the bisection result by building at 3282a3da25bd (which sees
 the issue) and the commit just before (which does not).
>>> Thanks for testing, that confirms it well.
>>>
>>> [snip patch]
>>>
 I still saw the issue with this change applied. PPC_IRQ_SOFT_MASK_DEBUG
 didn't report anything (either with or without the change above).
>>> Okay, it was a bit of a shot in the dark. I still can't see what
>>> else has changed.
>>>
>>> What would cause this, a lost interrupt? A spurious interrupt? Or
>>> higher interrupt latency?
>>>
>>> I don't think the patch should cause significantly worse latency,
>>> (it's supposed to be a bit better if anything because it doesn't set
>>> up the full interrupt frame). But it's possible.
>> My working theory is that the SPI_DON indication is all about the TX
>> direction an now that the interrupts are faster we're hitting an error
>> because there is still RX activity going on. Heiner disagrees with my
>> interpretation of the SPI_DON indication and the fact that it doesn't
>> happen every time does throw doubt on it.
>>
> It's right that the eSPI spec can be interpreted that SPI_DON refers to
> TX only. However this wouldn't really make sense, because also for RX
> we program the frame 

Re: fsl_espi errors on v5.7.15

2020-09-01 Thread Heiner Kallweit
On 01.09.2020 04:52, Chris Packham wrote:
> 
> On 1/09/20 12:33 am, Heiner Kallweit wrote:
>> On 30.08.2020 23:59, Chris Packham wrote:
>>> On 31/08/20 9:41 am, Heiner Kallweit wrote:
 On 30.08.2020 23:00, Chris Packham wrote:
> On 31/08/20 12:30 am, Nicholas Piggin wrote:
>> Excerpts from Chris Packham's message of August 28, 2020 8:07 am:
> 
>
>>> I've also now seen the RX FIFO not empty error on the T2080RDB
>>>
>>> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
>>> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
>>> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
>>> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
>>> fsl_espi ffe11.spi: Transfer done but rx/tx fifo's aren't empty!
>>> fsl_espi ffe11.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32
>>>
>>> With my current workaround of emptying the RX FIFO. It seems
>>> survivable. Interestingly it only ever seems to be 1 extra byte in 
>>> the
>>> RX FIFO and it seems to be after either a READ_SR or a READ_FSR.
>>>
>>> fsl_espi ffe11.spi: tx 70
>>> fsl_espi ffe11.spi: rx 03
>>> fsl_espi ffe11.spi: Extra RX 00
>>> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
>>> fsl_espi ffe11.spi: Transfer done but rx/tx fifo's aren't empty!
>>> fsl_espi ffe11.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32
>>> fsl_espi ffe11.spi: tx 05
>>> fsl_espi ffe11.spi: rx 00
>>> fsl_espi ffe11.spi: Extra RX 03
>>> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
>>> fsl_espi ffe11.spi: Transfer done but rx/tx fifo's aren't empty!
>>> fsl_espi ffe11.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32
>>> fsl_espi ffe11.spi: tx 05
>>> fsl_espi ffe11.spi: rx 00
>>> fsl_espi ffe11.spi: Extra RX 03
>>>
>>> From all the Micron SPI-NOR datasheets I've got access to it is
>>> possible to continually read the SR/FSR. But I've no idea why it
>>> happens some times and not others.
>> So I think I've got a reproduction and I think I've bisected the 
>> problem
>> to commit 3282a3da25bd ("powerpc/64: Implement soft interrupt replay 
>> in
>> C"). My day is just finishing now so I haven't applied too much 
>> scrutiny
>> to this result. Given the various rabbit holes I've been down on this
>> issue already I'd take this information with a good degree of 
>> skepticism.
>>
> OK, so an easy test should be to re-test with a 5.4 kernel.
> It doesn't have yet the change you're referring to, and the fsl-espi 
> driver
> is basically the same as in 5.7 (just two small changes in 5.7).
 There's 6cc0c16d82f88 and maybe also other interrupt related patches
 around this time that could affect book E, so it's good if that exact
 patch is confirmed.
>>> My confirmation is basically that I can induce the issue in a 5.4 kernel
>>> by cherry-picking 3282a3da25bd. I'm also able to "fix" the issue in
>>> 5.9-rc2 by reverting that one commit.
>>>
>>> I both cases it's not exactly a clean cherry-pick/revert so I also
>>> confirmed the bisection result by building at 3282a3da25bd (which sees
>>> the issue) and the commit just before (which does not).
>> Thanks for testing, that confirms it well.
>>
>> [snip patch]
>>
>>> I still saw the issue with this change applied. PPC_IRQ_SOFT_MASK_DEBUG
>>> didn't report anything (either with or without the change above).
>> Okay, it was a bit of a shot in the dark. I still can't see what
>> else has changed.
>>
>> What would cause this, a lost interrupt? A spurious interrupt? Or
>> higher interrupt latency?
>>
>> I don't think the patch should cause significantly worse latency,
>> (it's supposed to be a bit better if anything because it doesn't set
>> up the full interrupt frame). But it's possible.
> My working theory is that the SPI_DON indication is all about the TX
> direction an now that the interrupts are faster we're hitting an error
> because there is still RX activity going on. Heiner disagrees with my
> interpretation of the SPI_DON indication and the fact that it doesn't
> happen every time does throw doubt on it.
>
 It's right that the eSPI spec can be interpreted that SPI_DON refers to
 TX only. However this wouldn't really make sense, because also for RX
 we program the frame length, and therefore want to be notified once the
 full frame was received. Also practical experience shows that SPI_DON
 is set also after RX-only transfers.
 Typical SPI NOR use case is that you write read command + start 

Re: fsl_espi errors on v5.7.15

2020-09-01 Thread Nicholas Piggin
Excerpts from Chris Packham's message of September 1, 2020 11:25 am:
> 
> On 1/09/20 12:33 am, Heiner Kallweit wrote:
>> On 30.08.2020 23:59, Chris Packham wrote:
>>> On 31/08/20 9:41 am, Heiner Kallweit wrote:
 On 30.08.2020 23:00, Chris Packham wrote:
> On 31/08/20 12:30 am, Nicholas Piggin wrote:
>> Excerpts from Chris Packham's message of August 28, 2020 8:07 am:
> 
>
>>> I've also now seen the RX FIFO not empty error on the T2080RDB
>>>
>>> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
>>> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
>>> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
>>> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
>>> fsl_espi ffe11.spi: Transfer done but rx/tx fifo's aren't empty!
>>> fsl_espi ffe11.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32
>>>
>>> With my current workaround of emptying the RX FIFO. It seems
>>> survivable. Interestingly it only ever seems to be 1 extra byte in 
>>> the
>>> RX FIFO and it seems to be after either a READ_SR or a READ_FSR.
>>>
>>> fsl_espi ffe11.spi: tx 70
>>> fsl_espi ffe11.spi: rx 03
>>> fsl_espi ffe11.spi: Extra RX 00
>>> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
>>> fsl_espi ffe11.spi: Transfer done but rx/tx fifo's aren't empty!
>>> fsl_espi ffe11.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32
>>> fsl_espi ffe11.spi: tx 05
>>> fsl_espi ffe11.spi: rx 00
>>> fsl_espi ffe11.spi: Extra RX 03
>>> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
>>> fsl_espi ffe11.spi: Transfer done but rx/tx fifo's aren't empty!
>>> fsl_espi ffe11.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32
>>> fsl_espi ffe11.spi: tx 05
>>> fsl_espi ffe11.spi: rx 00
>>> fsl_espi ffe11.spi: Extra RX 03
>>>
>>> From all the Micron SPI-NOR datasheets I've got access to it is
>>> possible to continually read the SR/FSR. But I've no idea why it
>>> happens some times and not others.
>> So I think I've got a reproduction and I think I've bisected the 
>> problem
>> to commit 3282a3da25bd ("powerpc/64: Implement soft interrupt replay 
>> in
>> C"). My day is just finishing now so I haven't applied too much 
>> scrutiny
>> to this result. Given the various rabbit holes I've been down on this
>> issue already I'd take this information with a good degree of 
>> skepticism.
>>
> OK, so an easy test should be to re-test with a 5.4 kernel.
> It doesn't have yet the change you're referring to, and the fsl-espi 
> driver
> is basically the same as in 5.7 (just two small changes in 5.7).
 There's 6cc0c16d82f88 and maybe also other interrupt related patches
 around this time that could affect book E, so it's good if that exact
 patch is confirmed.
>>> My confirmation is basically that I can induce the issue in a 5.4 kernel
>>> by cherry-picking 3282a3da25bd. I'm also able to "fix" the issue in
>>> 5.9-rc2 by reverting that one commit.
>>>
>>> I both cases it's not exactly a clean cherry-pick/revert so I also
>>> confirmed the bisection result by building at 3282a3da25bd (which sees
>>> the issue) and the commit just before (which does not).
>> Thanks for testing, that confirms it well.
>>
>> [snip patch]
>>
>>> I still saw the issue with this change applied. PPC_IRQ_SOFT_MASK_DEBUG
>>> didn't report anything (either with or without the change above).
>> Okay, it was a bit of a shot in the dark. I still can't see what
>> else has changed.
>>
>> What would cause this, a lost interrupt? A spurious interrupt? Or
>> higher interrupt latency?
>>
>> I don't think the patch should cause significantly worse latency,
>> (it's supposed to be a bit better if anything because it doesn't set
>> up the full interrupt frame). But it's possible.
> My working theory is that the SPI_DON indication is all about the TX
> direction an now that the interrupts are faster we're hitting an error
> because there is still RX activity going on. Heiner disagrees with my
> interpretation of the SPI_DON indication and the fact that it doesn't
> happen every time does throw doubt on it.
>
 It's right that the eSPI spec can be interpreted that SPI_DON refers to
 TX only. However this wouldn't really make sense, because also for RX
 we program the frame length, and therefore want to be notified once the
 full frame was received. Also practical experience shows that SPI_DON
 is set also after RX-only transfers.
 Typical SPI NOR use case is that you 

Re: fsl_espi errors on v5.7.15

2020-08-31 Thread Chris Packham

On 1/09/20 12:33 am, Heiner Kallweit wrote:
> On 30.08.2020 23:59, Chris Packham wrote:
>> On 31/08/20 9:41 am, Heiner Kallweit wrote:
>>> On 30.08.2020 23:00, Chris Packham wrote:
 On 31/08/20 12:30 am, Nicholas Piggin wrote:
> Excerpts from Chris Packham's message of August 28, 2020 8:07 am:
 

>> I've also now seen the RX FIFO not empty error on the T2080RDB
>>
>> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
>> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
>> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
>> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
>> fsl_espi ffe11.spi: Transfer done but rx/tx fifo's aren't empty!
>> fsl_espi ffe11.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32
>>
>> With my current workaround of emptying the RX FIFO. It seems
>> survivable. Interestingly it only ever seems to be 1 extra byte in 
>> the
>> RX FIFO and it seems to be after either a READ_SR or a READ_FSR.
>>
>> fsl_espi ffe11.spi: tx 70
>> fsl_espi ffe11.spi: rx 03
>> fsl_espi ffe11.spi: Extra RX 00
>> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
>> fsl_espi ffe11.spi: Transfer done but rx/tx fifo's aren't empty!
>> fsl_espi ffe11.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32
>> fsl_espi ffe11.spi: tx 05
>> fsl_espi ffe11.spi: rx 00
>> fsl_espi ffe11.spi: Extra RX 03
>> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
>> fsl_espi ffe11.spi: Transfer done but rx/tx fifo's aren't empty!
>> fsl_espi ffe11.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32
>> fsl_espi ffe11.spi: tx 05
>> fsl_espi ffe11.spi: rx 00
>> fsl_espi ffe11.spi: Extra RX 03
>>
>> From all the Micron SPI-NOR datasheets I've got access to it is
>> possible to continually read the SR/FSR. But I've no idea why it
>> happens some times and not others.
> So I think I've got a reproduction and I think I've bisected the 
> problem
> to commit 3282a3da25bd ("powerpc/64: Implement soft interrupt replay 
> in
> C"). My day is just finishing now so I haven't applied too much 
> scrutiny
> to this result. Given the various rabbit holes I've been down on this
> issue already I'd take this information with a good degree of 
> skepticism.
>
 OK, so an easy test should be to re-test with a 5.4 kernel.
 It doesn't have yet the change you're referring to, and the fsl-espi 
 driver
 is basically the same as in 5.7 (just two small changes in 5.7).
>>> There's 6cc0c16d82f88 and maybe also other interrupt related patches
>>> around this time that could affect book E, so it's good if that exact
>>> patch is confirmed.
>> My confirmation is basically that I can induce the issue in a 5.4 kernel
>> by cherry-picking 3282a3da25bd. I'm also able to "fix" the issue in
>> 5.9-rc2 by reverting that one commit.
>>
>> I both cases it's not exactly a clean cherry-pick/revert so I also
>> confirmed the bisection result by building at 3282a3da25bd (which sees
>> the issue) and the commit just before (which does not).
> Thanks for testing, that confirms it well.
>
> [snip patch]
>
>> I still saw the issue with this change applied. PPC_IRQ_SOFT_MASK_DEBUG
>> didn't report anything (either with or without the change above).
> Okay, it was a bit of a shot in the dark. I still can't see what
> else has changed.
>
> What would cause this, a lost interrupt? A spurious interrupt? Or
> higher interrupt latency?
>
> I don't think the patch should cause significantly worse latency,
> (it's supposed to be a bit better if anything because it doesn't set
> up the full interrupt frame). But it's possible.
 My working theory is that the SPI_DON indication is all about the TX
 direction an now that the interrupts are faster we're hitting an error
 because there is still RX activity going on. Heiner disagrees with my
 interpretation of the SPI_DON indication and the fact that it doesn't
 happen every time does throw doubt on it.

>>> It's right that the eSPI spec can be interpreted that SPI_DON refers to
>>> TX only. However this wouldn't really make sense, because also for RX
>>> we program the frame length, and therefore want to be notified once the
>>> full frame was received. Also practical experience shows that SPI_DON
>>> is set also after RX-only transfers.
>>> Typical SPI NOR use case is that you write read command + start address,
>>> followed by a longer read. If the TX-only interpretation would be right,
>>> we'd always end up with SPI_DON not being set.
>>>

Re: fsl_espi errors on v5.7.15

2020-08-31 Thread Chris Packham

On 1/09/20 12:33 am, Heiner Kallweit wrote:
> On 30.08.2020 23:59, Chris Packham wrote:
>> On 31/08/20 9:41 am, Heiner Kallweit wrote:
>>> On 30.08.2020 23:00, Chris Packham wrote:
 On 31/08/20 12:30 am, Nicholas Piggin wrote:
> Excerpts from Chris Packham's message of August 28, 2020 8:07 am:
 

>> I've also now seen the RX FIFO not empty error on the T2080RDB
>>
>> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
>> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
>> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
>> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
>> fsl_espi ffe11.spi: Transfer done but rx/tx fifo's aren't empty!
>> fsl_espi ffe11.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32
>>
>> With my current workaround of emptying the RX FIFO. It seems
>> survivable. Interestingly it only ever seems to be 1 extra byte in 
>> the
>> RX FIFO and it seems to be after either a READ_SR or a READ_FSR.
>>
>> fsl_espi ffe11.spi: tx 70
>> fsl_espi ffe11.spi: rx 03
>> fsl_espi ffe11.spi: Extra RX 00
>> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
>> fsl_espi ffe11.spi: Transfer done but rx/tx fifo's aren't empty!
>> fsl_espi ffe11.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32
>> fsl_espi ffe11.spi: tx 05
>> fsl_espi ffe11.spi: rx 00
>> fsl_espi ffe11.spi: Extra RX 03
>> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
>> fsl_espi ffe11.spi: Transfer done but rx/tx fifo's aren't empty!
>> fsl_espi ffe11.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32
>> fsl_espi ffe11.spi: tx 05
>> fsl_espi ffe11.spi: rx 00
>> fsl_espi ffe11.spi: Extra RX 03
>>
>> From all the Micron SPI-NOR datasheets I've got access to it is
>> possible to continually read the SR/FSR. But I've no idea why it
>> happens some times and not others.
> So I think I've got a reproduction and I think I've bisected the 
> problem
> to commit 3282a3da25bd ("powerpc/64: Implement soft interrupt replay 
> in
> C"). My day is just finishing now so I haven't applied too much 
> scrutiny
> to this result. Given the various rabbit holes I've been down on this
> issue already I'd take this information with a good degree of 
> skepticism.
>
 OK, so an easy test should be to re-test with a 5.4 kernel.
 It doesn't have yet the change you're referring to, and the fsl-espi 
 driver
 is basically the same as in 5.7 (just two small changes in 5.7).
>>> There's 6cc0c16d82f88 and maybe also other interrupt related patches
>>> around this time that could affect book E, so it's good if that exact
>>> patch is confirmed.
>> My confirmation is basically that I can induce the issue in a 5.4 kernel
>> by cherry-picking 3282a3da25bd. I'm also able to "fix" the issue in
>> 5.9-rc2 by reverting that one commit.
>>
>> I both cases it's not exactly a clean cherry-pick/revert so I also
>> confirmed the bisection result by building at 3282a3da25bd (which sees
>> the issue) and the commit just before (which does not).
> Thanks for testing, that confirms it well.
>
> [snip patch]
>
>> I still saw the issue with this change applied. PPC_IRQ_SOFT_MASK_DEBUG
>> didn't report anything (either with or without the change above).
> Okay, it was a bit of a shot in the dark. I still can't see what
> else has changed.
>
> What would cause this, a lost interrupt? A spurious interrupt? Or
> higher interrupt latency?
>
> I don't think the patch should cause significantly worse latency,
> (it's supposed to be a bit better if anything because it doesn't set
> up the full interrupt frame). But it's possible.
 My working theory is that the SPI_DON indication is all about the TX
 direction an now that the interrupts are faster we're hitting an error
 because there is still RX activity going on. Heiner disagrees with my
 interpretation of the SPI_DON indication and the fact that it doesn't
 happen every time does throw doubt on it.

>>> It's right that the eSPI spec can be interpreted that SPI_DON refers to
>>> TX only. However this wouldn't really make sense, because also for RX
>>> we program the frame length, and therefore want to be notified once the
>>> full frame was received. Also practical experience shows that SPI_DON
>>> is set also after RX-only transfers.
>>> Typical SPI NOR use case is that you write read command + start address,
>>> followed by a longer read. If the TX-only interpretation would be right,
>>> we'd always end up with SPI_DON not being set.
>>>

Re: fsl_espi errors on v5.7.15

2020-08-31 Thread Heiner Kallweit
On 30.08.2020 23:59, Chris Packham wrote:
> 
> On 31/08/20 9:41 am, Heiner Kallweit wrote:
>> On 30.08.2020 23:00, Chris Packham wrote:
>>> On 31/08/20 12:30 am, Nicholas Piggin wrote:
 Excerpts from Chris Packham's message of August 28, 2020 8:07 am:
>>> 
>>>
> I've also now seen the RX FIFO not empty error on the T2080RDB
>
> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
> fsl_espi ffe11.spi: Transfer done but rx/tx fifo's aren't empty!
> fsl_espi ffe11.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32
>
> With my current workaround of emptying the RX FIFO. It seems
> survivable. Interestingly it only ever seems to be 1 extra byte in the
> RX FIFO and it seems to be after either a READ_SR or a READ_FSR.
>
> fsl_espi ffe11.spi: tx 70
> fsl_espi ffe11.spi: rx 03
> fsl_espi ffe11.spi: Extra RX 00
> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
> fsl_espi ffe11.spi: Transfer done but rx/tx fifo's aren't empty!
> fsl_espi ffe11.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32
> fsl_espi ffe11.spi: tx 05
> fsl_espi ffe11.spi: rx 00
> fsl_espi ffe11.spi: Extra RX 03
> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
> fsl_espi ffe11.spi: Transfer done but rx/tx fifo's aren't empty!
> fsl_espi ffe11.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32
> fsl_espi ffe11.spi: tx 05
> fsl_espi ffe11.spi: rx 00
> fsl_espi ffe11.spi: Extra RX 03
>
>From all the Micron SPI-NOR datasheets I've got access to it is
> possible to continually read the SR/FSR. But I've no idea why it
> happens some times and not others.
 So I think I've got a reproduction and I think I've bisected the 
 problem
 to commit 3282a3da25bd ("powerpc/64: Implement soft interrupt replay in
 C"). My day is just finishing now so I haven't applied too much 
 scrutiny
 to this result. Given the various rabbit holes I've been down on this
 issue already I'd take this information with a good degree of 
 skepticism.

>>> OK, so an easy test should be to re-test with a 5.4 kernel.
>>> It doesn't have yet the change you're referring to, and the fsl-espi 
>>> driver
>>> is basically the same as in 5.7 (just two small changes in 5.7).
>> There's 6cc0c16d82f88 and maybe also other interrupt related patches
>> around this time that could affect book E, so it's good if that exact
>> patch is confirmed.
> My confirmation is basically that I can induce the issue in a 5.4 kernel
> by cherry-picking 3282a3da25bd. I'm also able to "fix" the issue in
> 5.9-rc2 by reverting that one commit.
>
> I both cases it's not exactly a clean cherry-pick/revert so I also
> confirmed the bisection result by building at 3282a3da25bd (which sees
> the issue) and the commit just before (which does not).
 Thanks for testing, that confirms it well.

 [snip patch]

> I still saw the issue with this change applied. PPC_IRQ_SOFT_MASK_DEBUG
> didn't report anything (either with or without the change above).
 Okay, it was a bit of a shot in the dark. I still can't see what
 else has changed.

 What would cause this, a lost interrupt? A spurious interrupt? Or
 higher interrupt latency?

 I don't think the patch should cause significantly worse latency,
 (it's supposed to be a bit better if anything because it doesn't set
 up the full interrupt frame). But it's possible.
>>> My working theory is that the SPI_DON indication is all about the TX
>>> direction an now that the interrupts are faster we're hitting an error
>>> because there is still RX activity going on. Heiner disagrees with my
>>> interpretation of the SPI_DON indication and the fact that it doesn't
>>> happen every time does throw doubt on it.
>>>
>> It's right that the eSPI spec can be interpreted that SPI_DON refers to
>> TX only. However this wouldn't really make sense, because also for RX
>> we program the frame length, and therefore want to be notified once the
>> full frame was received. Also practical experience shows that SPI_DON
>> is set also after RX-only transfers.
>> Typical SPI NOR use case is that you write read command + start address,
>> followed by a longer read. If the TX-only interpretation would be right,
>> we'd always end up with SPI_DON not being set.
>>
>>> I can't really explain the extra RX byte in the fifo. We know how many
>>> bytes to expect and we pull that many from the fifo so it's not as if
>>> we're 

Re: fsl_espi errors on v5.7.15

2020-08-30 Thread Chris Packham

On 31/08/20 9:41 am, Heiner Kallweit wrote:
> On 30.08.2020 23:00, Chris Packham wrote:
>> On 31/08/20 12:30 am, Nicholas Piggin wrote:
>>> Excerpts from Chris Packham's message of August 28, 2020 8:07 am:
>> 
>>
 I've also now seen the RX FIFO not empty error on the T2080RDB

 fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
 fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
 fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
 fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
 fsl_espi ffe11.spi: Transfer done but rx/tx fifo's aren't empty!
 fsl_espi ffe11.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32

 With my current workaround of emptying the RX FIFO. It seems
 survivable. Interestingly it only ever seems to be 1 extra byte in the
 RX FIFO and it seems to be after either a READ_SR or a READ_FSR.

 fsl_espi ffe11.spi: tx 70
 fsl_espi ffe11.spi: rx 03
 fsl_espi ffe11.spi: Extra RX 00
 fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
 fsl_espi ffe11.spi: Transfer done but rx/tx fifo's aren't empty!
 fsl_espi ffe11.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32
 fsl_espi ffe11.spi: tx 05
 fsl_espi ffe11.spi: rx 00
 fsl_espi ffe11.spi: Extra RX 03
 fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
 fsl_espi ffe11.spi: Transfer done but rx/tx fifo's aren't empty!
 fsl_espi ffe11.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32
 fsl_espi ffe11.spi: tx 05
 fsl_espi ffe11.spi: rx 00
 fsl_espi ffe11.spi: Extra RX 03

From all the Micron SPI-NOR datasheets I've got access to it is
 possible to continually read the SR/FSR. But I've no idea why it
 happens some times and not others.
>>> So I think I've got a reproduction and I think I've bisected the problem
>>> to commit 3282a3da25bd ("powerpc/64: Implement soft interrupt replay in
>>> C"). My day is just finishing now so I haven't applied too much scrutiny
>>> to this result. Given the various rabbit holes I've been down on this
>>> issue already I'd take this information with a good degree of 
>>> skepticism.
>>>
>> OK, so an easy test should be to re-test with a 5.4 kernel.
>> It doesn't have yet the change you're referring to, and the fsl-espi 
>> driver
>> is basically the same as in 5.7 (just two small changes in 5.7).
> There's 6cc0c16d82f88 and maybe also other interrupt related patches
> around this time that could affect book E, so it's good if that exact
> patch is confirmed.
 My confirmation is basically that I can induce the issue in a 5.4 kernel
 by cherry-picking 3282a3da25bd. I'm also able to "fix" the issue in
 5.9-rc2 by reverting that one commit.

 I both cases it's not exactly a clean cherry-pick/revert so I also
 confirmed the bisection result by building at 3282a3da25bd (which sees
 the issue) and the commit just before (which does not).
>>> Thanks for testing, that confirms it well.
>>>
>>> [snip patch]
>>>
 I still saw the issue with this change applied. PPC_IRQ_SOFT_MASK_DEBUG
 didn't report anything (either with or without the change above).
>>> Okay, it was a bit of a shot in the dark. I still can't see what
>>> else has changed.
>>>
>>> What would cause this, a lost interrupt? A spurious interrupt? Or
>>> higher interrupt latency?
>>>
>>> I don't think the patch should cause significantly worse latency,
>>> (it's supposed to be a bit better if anything because it doesn't set
>>> up the full interrupt frame). But it's possible.
>> My working theory is that the SPI_DON indication is all about the TX
>> direction an now that the interrupts are faster we're hitting an error
>> because there is still RX activity going on. Heiner disagrees with my
>> interpretation of the SPI_DON indication and the fact that it doesn't
>> happen every time does throw doubt on it.
>>
> It's right that the eSPI spec can be interpreted that SPI_DON refers to
> TX only. However this wouldn't really make sense, because also for RX
> we program the frame length, and therefore want to be notified once the
> full frame was received. Also practical experience shows that SPI_DON
> is set also after RX-only transfers.
> Typical SPI NOR use case is that you write read command + start address,
> followed by a longer read. If the TX-only interpretation would be right,
> we'd always end up with SPI_DON not being set.
>
>> I can't really explain the extra RX byte in the fifo. We know how many
>> bytes to expect and we pull that many from the fifo so it's not as if
>> we're missing an interrupt causing us to skip the last byte. I've been
>> looking for some kind of off-by-one calculation but again if it were
>> something like 

Re: fsl_espi errors on v5.7.15

2020-08-30 Thread Heiner Kallweit
On 30.08.2020 23:00, Chris Packham wrote:
> 
> On 31/08/20 12:30 am, Nicholas Piggin wrote:
>> Excerpts from Chris Packham's message of August 28, 2020 8:07 am:
> 
> 
> 
>>> I've also now seen the RX FIFO not empty error on the T2080RDB
>>>
>>> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
>>> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
>>> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
>>> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
>>> fsl_espi ffe11.spi: Transfer done but rx/tx fifo's aren't empty!
>>> fsl_espi ffe11.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32
>>>
>>> With my current workaround of emptying the RX FIFO. It seems
>>> survivable. Interestingly it only ever seems to be 1 extra byte in the
>>> RX FIFO and it seems to be after either a READ_SR or a READ_FSR.
>>>
>>> fsl_espi ffe11.spi: tx 70
>>> fsl_espi ffe11.spi: rx 03
>>> fsl_espi ffe11.spi: Extra RX 00
>>> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
>>> fsl_espi ffe11.spi: Transfer done but rx/tx fifo's aren't empty!
>>> fsl_espi ffe11.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32
>>> fsl_espi ffe11.spi: tx 05
>>> fsl_espi ffe11.spi: rx 00
>>> fsl_espi ffe11.spi: Extra RX 03
>>> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
>>> fsl_espi ffe11.spi: Transfer done but rx/tx fifo's aren't empty!
>>> fsl_espi ffe11.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32
>>> fsl_espi ffe11.spi: tx 05
>>> fsl_espi ffe11.spi: rx 00
>>> fsl_espi ffe11.spi: Extra RX 03
>>>
>>>   From all the Micron SPI-NOR datasheets I've got access to it is
>>> possible to continually read the SR/FSR. But I've no idea why it
>>> happens some times and not others.
>> So I think I've got a reproduction and I think I've bisected the problem
>> to commit 3282a3da25bd ("powerpc/64: Implement soft interrupt replay in
>> C"). My day is just finishing now so I haven't applied too much scrutiny
>> to this result. Given the various rabbit holes I've been down on this
>> issue already I'd take this information with a good degree of skepticism.
>>
> OK, so an easy test should be to re-test with a 5.4 kernel.
> It doesn't have yet the change you're referring to, and the fsl-espi 
> driver
> is basically the same as in 5.7 (just two small changes in 5.7).
 There's 6cc0c16d82f88 and maybe also other interrupt related patches
 around this time that could affect book E, so it's good if that exact
 patch is confirmed.
>>> My confirmation is basically that I can induce the issue in a 5.4 kernel
>>> by cherry-picking 3282a3da25bd. I'm also able to "fix" the issue in
>>> 5.9-rc2 by reverting that one commit.
>>>
>>> I both cases it's not exactly a clean cherry-pick/revert so I also
>>> confirmed the bisection result by building at 3282a3da25bd (which sees
>>> the issue) and the commit just before (which does not).
>> Thanks for testing, that confirms it well.
>>
>> [snip patch]
>>
>>> I still saw the issue with this change applied. PPC_IRQ_SOFT_MASK_DEBUG
>>> didn't report anything (either with or without the change above).
>> Okay, it was a bit of a shot in the dark. I still can't see what
>> else has changed.
>>
>> What would cause this, a lost interrupt? A spurious interrupt? Or
>> higher interrupt latency?
>>
>> I don't think the patch should cause significantly worse latency,
>> (it's supposed to be a bit better if anything because it doesn't set
>> up the full interrupt frame). But it's possible.
> 
> My working theory is that the SPI_DON indication is all about the TX 
> direction an now that the interrupts are faster we're hitting an error 
> because there is still RX activity going on. Heiner disagrees with my 
> interpretation of the SPI_DON indication and the fact that it doesn't 
> happen every time does throw doubt on it.
> 
It's right that the eSPI spec can be interpreted that SPI_DON refers to
TX only. However this wouldn't really make sense, because also for RX
we program the frame length, and therefore want to be notified once the
full frame was received. Also practical experience shows that SPI_DON
is set also after RX-only transfers.
Typical SPI NOR use case is that you write read command + start address,
followed by a longer read. If the TX-only interpretation would be right,
we'd always end up with SPI_DON not being set.

> I can't really explain the extra RX byte in the fifo. We know how many 
> bytes to expect and we pull that many from the fifo so it's not as if 
> we're missing an interrupt causing us to skip the last byte. I've been 
> looking for some kind of off-by-one calculation but again if it were 
> something like that it'd happen all the time.
> 
Maybe it helps to know what value this extra byte in the FIFO has. Is it:
- a duplicate of the last read 

Re: fsl_espi errors on v5.7.15

2020-08-30 Thread Chris Packham

On 31/08/20 12:30 am, Nicholas Piggin wrote:
> Excerpts from Chris Packham's message of August 28, 2020 8:07 am:



>> I've also now seen the RX FIFO not empty error on the T2080RDB
>>
>> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
>> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
>> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
>> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
>> fsl_espi ffe11.spi: Transfer done but rx/tx fifo's aren't empty!
>> fsl_espi ffe11.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32
>>
>> With my current workaround of emptying the RX FIFO. It seems
>> survivable. Interestingly it only ever seems to be 1 extra byte in the
>> RX FIFO and it seems to be after either a READ_SR or a READ_FSR.
>>
>> fsl_espi ffe11.spi: tx 70
>> fsl_espi ffe11.spi: rx 03
>> fsl_espi ffe11.spi: Extra RX 00
>> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
>> fsl_espi ffe11.spi: Transfer done but rx/tx fifo's aren't empty!
>> fsl_espi ffe11.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32
>> fsl_espi ffe11.spi: tx 05
>> fsl_espi ffe11.spi: rx 00
>> fsl_espi ffe11.spi: Extra RX 03
>> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
>> fsl_espi ffe11.spi: Transfer done but rx/tx fifo's aren't empty!
>> fsl_espi ffe11.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32
>> fsl_espi ffe11.spi: tx 05
>> fsl_espi ffe11.spi: rx 00
>> fsl_espi ffe11.spi: Extra RX 03
>>
>>   From all the Micron SPI-NOR datasheets I've got access to it is
>> possible to continually read the SR/FSR. But I've no idea why it
>> happens some times and not others.
> So I think I've got a reproduction and I think I've bisected the problem
> to commit 3282a3da25bd ("powerpc/64: Implement soft interrupt replay in
> C"). My day is just finishing now so I haven't applied too much scrutiny
> to this result. Given the various rabbit holes I've been down on this
> issue already I'd take this information with a good degree of skepticism.
>
 OK, so an easy test should be to re-test with a 5.4 kernel.
 It doesn't have yet the change you're referring to, and the fsl-espi driver
 is basically the same as in 5.7 (just two small changes in 5.7).
>>> There's 6cc0c16d82f88 and maybe also other interrupt related patches
>>> around this time that could affect book E, so it's good if that exact
>>> patch is confirmed.
>> My confirmation is basically that I can induce the issue in a 5.4 kernel
>> by cherry-picking 3282a3da25bd. I'm also able to "fix" the issue in
>> 5.9-rc2 by reverting that one commit.
>>
>> I both cases it's not exactly a clean cherry-pick/revert so I also
>> confirmed the bisection result by building at 3282a3da25bd (which sees
>> the issue) and the commit just before (which does not).
> Thanks for testing, that confirms it well.
>
> [snip patch]
>
>> I still saw the issue with this change applied. PPC_IRQ_SOFT_MASK_DEBUG
>> didn't report anything (either with or without the change above).
> Okay, it was a bit of a shot in the dark. I still can't see what
> else has changed.
>
> What would cause this, a lost interrupt? A spurious interrupt? Or
> higher interrupt latency?
>
> I don't think the patch should cause significantly worse latency,
> (it's supposed to be a bit better if anything because it doesn't set
> up the full interrupt frame). But it's possible.

My working theory is that the SPI_DON indication is all about the TX 
direction an now that the interrupts are faster we're hitting an error 
because there is still RX activity going on. Heiner disagrees with my 
interpretation of the SPI_DON indication and the fact that it doesn't 
happen every time does throw doubt on it.

I can't really explain the extra RX byte in the fifo. We know how many 
bytes to expect and we pull that many from the fifo so it's not as if 
we're missing an interrupt causing us to skip the last byte. I've been 
looking for some kind of off-by-one calculation but again if it were 
something like that it'd happen all the time.



Re: fsl_espi errors on v5.7.15

2020-08-30 Thread Heiner Kallweit
On 30.08.2020 14:30, Nicholas Piggin wrote:
> Excerpts from Chris Packham's message of August 28, 2020 8:07 am:
>> On 27/08/20 7:12 pm, Nicholas Piggin wrote:
>>> Excerpts from Heiner Kallweit's message of August 26, 2020 4:38 pm:
 On 26.08.2020 08:07, Chris Packham wrote:
> On 26/08/20 1:48 pm, Chris Packham wrote:
>> On 26/08/20 10:22 am, Chris Packham wrote:
>>> On 25/08/20 7:22 pm, Heiner Kallweit wrote:
>>>
>>> 
 I've been staring at spi-fsl-espi.c for while now and I think I've
> identified a couple of deficiencies that may or may not be related
> to my
> issue.
>
> First I think the 'Transfer done but SPIE_DON isn't set' message
> can be
> generated spuriously. In fsl_espi_irq() we read the ESPI_SPIE
> register.
> We also write back to it to clear the current events. We re-read it in
> fsl_espi_cpu_irq() and complain when SPIE_DON is not set. But we can
> naturally end up in that situation if we're doing a large read.
> Consider
> the messages for reading a block of data from a spi-nor chip
>
>     tx = READ_OP + ADDR
>     rx = data
>
> We setup the transfer and pump out the tx_buf. The first interrupt
> goes
> off and ESPI_SPIE has SPIM_DON and SPIM_RXT set. We empty the rx fifo,
> clear ESPI_SPIE and wait for the next interrupt. The next interrupt
> fires and this time we have ESPI_SPIE with just SPIM_RXT set. This
> continues until we've received all the data and we finish with
> ESPI_SPIE
> having only SPIM_RXT set. When we re-read it we complain that SPIE_DON
> isn't set.
>
> The other deficiency is that we only get an interrupt when the
> amount of
> data in the rx fifo is above FSL_ESPI_RXTHR. If there are fewer than
> FSL_ESPI_RXTHR left to be received we will never pull them out of
> the fifo.
>
 SPIM_DON will trigger an interrupt once the last characters have been
 transferred, and read the remaining characters from the FIFO.
>>> The T2080RM that I have says the following about the DON bit
>>>
>>> "Last character was transmitted. The last character was transmitted
>>> and a new command can be written for the next frame."
>>>
>>> That does at least seem to fit with my assertion that it's all about
>>> the TX direction. But the fact that it doesn't happen all the time
>>> throws some doubt on it.
>>>
 I think the reason I'm seeing some variability is because of how fast
> (or slow) the interrupts get processed and how fast the spi-nor
> chip can
> fill the CPUs rx fifo.
>
 To rule out timing issues at high bus frequencies I initially asked
 for re-testing at lower frequencies. If you e.g. limit the bus to 1 MHz
 or even less, then timing shouldn't be an issue.
>>> Yes I've currently got spi-max-frequency = <100>; in my dts. I
>>> would also expect a slower frequency would fit my "DON is for TX"
>>> narrative.
 Last relevant functional changes have been done almost 4 years ago.
 And yours is the first such report I see. So question is what could
 be so
 special with your setup that it seems you're the only one being
 affected.
 The scenarios you describe are standard, therefore much more people
 should be affected in case of a driver bug.
>>> Agreed. But even on my hardware (which may have a latent issue
>>> despite being in the field for going on 5 years) the issue only
>>> triggers under some fairly specific circumstances.
 You said that kernel config impacts how frequently the issue happens.
 Therefore question is what's the diff in kernel config, and how could
 the differences be related to SPI.
>>> It did seem to be somewhat random. Things like CONFIG_PREEMPT have an
>>> impact but every time I found something that seemed to be having an
>>> impact I've been able to disprove it. I actually think its about how
>>> busy the system is which may or may not affect when we get round to
>>> processing the interrupts.
>>>
>>> I have managed to get the 'Transfer done but SPIE_DON isn't set!' to
>>> occur on the T2080RDB.
>>>
>>> I've had to add the following to expose the environment as a mtd
>>> partition
>>>
>>> diff --git a/arch/powerpc/boot/dts/fsl/t208xrdb.dtsi
>>> b/arch/powerpc/boot/dts/fsl/t208xrdb.dtsi
>>> index ff87e67c70da..fbf95fc1fd68 100644
>>> --- a/arch/powerpc/boot/dts/fsl/t208xrdb.dtsi
>>> +++ b/arch/powerpc/boot/dts/fsl/t208xrdb.dtsi
>>> @@ -116,6 +116,15 @@ flash@0 {
>>>      compatible = "micron,n25q512ax3",
>>> "jedec,spi-nor";
>>>   

Re: fsl_espi errors on v5.7.15

2020-08-30 Thread Nicholas Piggin
Excerpts from Chris Packham's message of August 28, 2020 8:07 am:
> On 27/08/20 7:12 pm, Nicholas Piggin wrote:
>> Excerpts from Heiner Kallweit's message of August 26, 2020 4:38 pm:
>>> On 26.08.2020 08:07, Chris Packham wrote:
 On 26/08/20 1:48 pm, Chris Packham wrote:
> On 26/08/20 10:22 am, Chris Packham wrote:
>> On 25/08/20 7:22 pm, Heiner Kallweit wrote:
>>
>> 
>>> I've been staring at spi-fsl-espi.c for while now and I think I've
 identified a couple of deficiencies that may or may not be related
 to my
 issue.

 First I think the 'Transfer done but SPIE_DON isn't set' message
 can be
 generated spuriously. In fsl_espi_irq() we read the ESPI_SPIE
 register.
 We also write back to it to clear the current events. We re-read it in
 fsl_espi_cpu_irq() and complain when SPIE_DON is not set. But we can
 naturally end up in that situation if we're doing a large read.
 Consider
 the messages for reading a block of data from a spi-nor chip

     tx = READ_OP + ADDR
     rx = data

 We setup the transfer and pump out the tx_buf. The first interrupt
 goes
 off and ESPI_SPIE has SPIM_DON and SPIM_RXT set. We empty the rx fifo,
 clear ESPI_SPIE and wait for the next interrupt. The next interrupt
 fires and this time we have ESPI_SPIE with just SPIM_RXT set. This
 continues until we've received all the data and we finish with
 ESPI_SPIE
 having only SPIM_RXT set. When we re-read it we complain that SPIE_DON
 isn't set.

 The other deficiency is that we only get an interrupt when the
 amount of
 data in the rx fifo is above FSL_ESPI_RXTHR. If there are fewer than
 FSL_ESPI_RXTHR left to be received we will never pull them out of
 the fifo.

>>> SPIM_DON will trigger an interrupt once the last characters have been
>>> transferred, and read the remaining characters from the FIFO.
>> The T2080RM that I have says the following about the DON bit
>>
>> "Last character was transmitted. The last character was transmitted
>> and a new command can be written for the next frame."
>>
>> That does at least seem to fit with my assertion that it's all about
>> the TX direction. But the fact that it doesn't happen all the time
>> throws some doubt on it.
>>
>>> I think the reason I'm seeing some variability is because of how fast
 (or slow) the interrupts get processed and how fast the spi-nor
 chip can
 fill the CPUs rx fifo.

>>> To rule out timing issues at high bus frequencies I initially asked
>>> for re-testing at lower frequencies. If you e.g. limit the bus to 1 MHz
>>> or even less, then timing shouldn't be an issue.
>> Yes I've currently got spi-max-frequency = <100>; in my dts. I
>> would also expect a slower frequency would fit my "DON is for TX"
>> narrative.
>>> Last relevant functional changes have been done almost 4 years ago.
>>> And yours is the first such report I see. So question is what could
>>> be so
>>> special with your setup that it seems you're the only one being
>>> affected.
>>> The scenarios you describe are standard, therefore much more people
>>> should be affected in case of a driver bug.
>> Agreed. But even on my hardware (which may have a latent issue
>> despite being in the field for going on 5 years) the issue only
>> triggers under some fairly specific circumstances.
>>> You said that kernel config impacts how frequently the issue happens.
>>> Therefore question is what's the diff in kernel config, and how could
>>> the differences be related to SPI.
>> It did seem to be somewhat random. Things like CONFIG_PREEMPT have an
>> impact but every time I found something that seemed to be having an
>> impact I've been able to disprove it. I actually think its about how
>> busy the system is which may or may not affect when we get round to
>> processing the interrupts.
>>
>> I have managed to get the 'Transfer done but SPIE_DON isn't set!' to
>> occur on the T2080RDB.
>>
>> I've had to add the following to expose the environment as a mtd
>> partition
>>
>> diff --git a/arch/powerpc/boot/dts/fsl/t208xrdb.dtsi
>> b/arch/powerpc/boot/dts/fsl/t208xrdb.dtsi
>> index ff87e67c70da..fbf95fc1fd68 100644
>> --- a/arch/powerpc/boot/dts/fsl/t208xrdb.dtsi
>> +++ b/arch/powerpc/boot/dts/fsl/t208xrdb.dtsi
>> @@ -116,6 +116,15 @@ flash@0 {
>>      compatible = "micron,n25q512ax3",
>> "jedec,spi-nor";
>>      reg = <0>;
>>      spi-max-frequency = <1000>; /*
>> input clock */
>> +
>> +

Re: fsl_espi errors on v5.7.15

2020-08-27 Thread Chris Packham
On 27/08/20 7:12 pm, Nicholas Piggin wrote:
> Excerpts from Heiner Kallweit's message of August 26, 2020 4:38 pm:
>> On 26.08.2020 08:07, Chris Packham wrote:
>>> On 26/08/20 1:48 pm, Chris Packham wrote:
 On 26/08/20 10:22 am, Chris Packham wrote:
> On 25/08/20 7:22 pm, Heiner Kallweit wrote:
>
> 
>> I've been staring at spi-fsl-espi.c for while now and I think I've
>>> identified a couple of deficiencies that may or may not be related
>>> to my
>>> issue.
>>>
>>> First I think the 'Transfer done but SPIE_DON isn't set' message
>>> can be
>>> generated spuriously. In fsl_espi_irq() we read the ESPI_SPIE
>>> register.
>>> We also write back to it to clear the current events. We re-read it in
>>> fsl_espi_cpu_irq() and complain when SPIE_DON is not set. But we can
>>> naturally end up in that situation if we're doing a large read.
>>> Consider
>>> the messages for reading a block of data from a spi-nor chip
>>>
>>>     tx = READ_OP + ADDR
>>>     rx = data
>>>
>>> We setup the transfer and pump out the tx_buf. The first interrupt
>>> goes
>>> off and ESPI_SPIE has SPIM_DON and SPIM_RXT set. We empty the rx fifo,
>>> clear ESPI_SPIE and wait for the next interrupt. The next interrupt
>>> fires and this time we have ESPI_SPIE with just SPIM_RXT set. This
>>> continues until we've received all the data and we finish with
>>> ESPI_SPIE
>>> having only SPIM_RXT set. When we re-read it we complain that SPIE_DON
>>> isn't set.
>>>
>>> The other deficiency is that we only get an interrupt when the
>>> amount of
>>> data in the rx fifo is above FSL_ESPI_RXTHR. If there are fewer than
>>> FSL_ESPI_RXTHR left to be received we will never pull them out of
>>> the fifo.
>>>
>> SPIM_DON will trigger an interrupt once the last characters have been
>> transferred, and read the remaining characters from the FIFO.
> The T2080RM that I have says the following about the DON bit
>
> "Last character was transmitted. The last character was transmitted
> and a new command can be written for the next frame."
>
> That does at least seem to fit with my assertion that it's all about
> the TX direction. But the fact that it doesn't happen all the time
> throws some doubt on it.
>
>> I think the reason I'm seeing some variability is because of how fast
>>> (or slow) the interrupts get processed and how fast the spi-nor
>>> chip can
>>> fill the CPUs rx fifo.
>>>
>> To rule out timing issues at high bus frequencies I initially asked
>> for re-testing at lower frequencies. If you e.g. limit the bus to 1 MHz
>> or even less, then timing shouldn't be an issue.
> Yes I've currently got spi-max-frequency = <100>; in my dts. I
> would also expect a slower frequency would fit my "DON is for TX"
> narrative.
>> Last relevant functional changes have been done almost 4 years ago.
>> And yours is the first such report I see. So question is what could
>> be so
>> special with your setup that it seems you're the only one being
>> affected.
>> The scenarios you describe are standard, therefore much more people
>> should be affected in case of a driver bug.
> Agreed. But even on my hardware (which may have a latent issue
> despite being in the field for going on 5 years) the issue only
> triggers under some fairly specific circumstances.
>> You said that kernel config impacts how frequently the issue happens.
>> Therefore question is what's the diff in kernel config, and how could
>> the differences be related to SPI.
> It did seem to be somewhat random. Things like CONFIG_PREEMPT have an
> impact but every time I found something that seemed to be having an
> impact I've been able to disprove it. I actually think its about how
> busy the system is which may or may not affect when we get round to
> processing the interrupts.
>
> I have managed to get the 'Transfer done but SPIE_DON isn't set!' to
> occur on the T2080RDB.
>
> I've had to add the following to expose the environment as a mtd
> partition
>
> diff --git a/arch/powerpc/boot/dts/fsl/t208xrdb.dtsi
> b/arch/powerpc/boot/dts/fsl/t208xrdb.dtsi
> index ff87e67c70da..fbf95fc1fd68 100644
> --- a/arch/powerpc/boot/dts/fsl/t208xrdb.dtsi
> +++ b/arch/powerpc/boot/dts/fsl/t208xrdb.dtsi
> @@ -116,6 +116,15 @@ flash@0 {
>      compatible = "micron,n25q512ax3",
> "jedec,spi-nor";
>      reg = <0>;
>      spi-max-frequency = <1000>; /*
> input clock */
> +
> +   partition@u-boot {
> +    reg = <0x 0x0010>;
> +    

Re: fsl_espi errors on v5.7.15

2020-08-27 Thread Nicholas Piggin
Excerpts from Heiner Kallweit's message of August 26, 2020 4:38 pm:
> On 26.08.2020 08:07, Chris Packham wrote:
>> 
>> On 26/08/20 1:48 pm, Chris Packham wrote:
>>>
>>> On 26/08/20 10:22 am, Chris Packham wrote:
 On 25/08/20 7:22 pm, Heiner Kallweit wrote:

 
> I've been staring at spi-fsl-espi.c for while now and I think I've
>> identified a couple of deficiencies that may or may not be related 
>> to my
>> issue.
>>
>> First I think the 'Transfer done but SPIE_DON isn't set' message 
>> can be
>> generated spuriously. In fsl_espi_irq() we read the ESPI_SPIE 
>> register.
>> We also write back to it to clear the current events. We re-read it in
>> fsl_espi_cpu_irq() and complain when SPIE_DON is not set. But we can
>> naturally end up in that situation if we're doing a large read. 
>> Consider
>> the messages for reading a block of data from a spi-nor chip
>>
>>    tx = READ_OP + ADDR
>>    rx = data
>>
>> We setup the transfer and pump out the tx_buf. The first interrupt 
>> goes
>> off and ESPI_SPIE has SPIM_DON and SPIM_RXT set. We empty the rx fifo,
>> clear ESPI_SPIE and wait for the next interrupt. The next interrupt
>> fires and this time we have ESPI_SPIE with just SPIM_RXT set. This
>> continues until we've received all the data and we finish with 
>> ESPI_SPIE
>> having only SPIM_RXT set. When we re-read it we complain that SPIE_DON
>> isn't set.
>>
>> The other deficiency is that we only get an interrupt when the 
>> amount of
>> data in the rx fifo is above FSL_ESPI_RXTHR. If there are fewer than
>> FSL_ESPI_RXTHR left to be received we will never pull them out of 
>> the fifo.
>>
> SPIM_DON will trigger an interrupt once the last characters have been
> transferred, and read the remaining characters from the FIFO.

 The T2080RM that I have says the following about the DON bit

 "Last character was transmitted. The last character was transmitted 
 and a new command can be written for the next frame."

 That does at least seem to fit with my assertion that it's all about 
 the TX direction. But the fact that it doesn't happen all the time 
 throws some doubt on it.

> I think the reason I'm seeing some variability is because of how fast
>> (or slow) the interrupts get processed and how fast the spi-nor 
>> chip can
>> fill the CPUs rx fifo.
>>
> To rule out timing issues at high bus frequencies I initially asked
> for re-testing at lower frequencies. If you e.g. limit the bus to 1 MHz
> or even less, then timing shouldn't be an issue.
 Yes I've currently got spi-max-frequency = <100>; in my dts. I 
 would also expect a slower frequency would fit my "DON is for TX" 
 narrative.
> Last relevant functional changes have been done almost 4 years ago.
> And yours is the first such report I see. So question is what could 
> be so
> special with your setup that it seems you're the only one being 
> affected.
> The scenarios you describe are standard, therefore much more people
> should be affected in case of a driver bug.
 Agreed. But even on my hardware (which may have a latent issue 
 despite being in the field for going on 5 years) the issue only 
 triggers under some fairly specific circumstances.
> You said that kernel config impacts how frequently the issue happens.
> Therefore question is what's the diff in kernel config, and how could
> the differences be related to SPI.

 It did seem to be somewhat random. Things like CONFIG_PREEMPT have an 
 impact but every time I found something that seemed to be having an 
 impact I've been able to disprove it. I actually think its about how 
 busy the system is which may or may not affect when we get round to 
 processing the interrupts.

 I have managed to get the 'Transfer done but SPIE_DON isn't set!' to 
 occur on the T2080RDB.

 I've had to add the following to expose the environment as a mtd 
 partition

 diff --git a/arch/powerpc/boot/dts/fsl/t208xrdb.dtsi 
 b/arch/powerpc/boot/dts/fsl/t208xrdb.dtsi
 index ff87e67c70da..fbf95fc1fd68 100644
 --- a/arch/powerpc/boot/dts/fsl/t208xrdb.dtsi
 +++ b/arch/powerpc/boot/dts/fsl/t208xrdb.dtsi
 @@ -116,6 +116,15 @@ flash@0 {
     compatible = "micron,n25q512ax3", 
 "jedec,spi-nor";
     reg = <0>;
     spi-max-frequency = <1000>; /* 
 input clock */
 +
 +   partition@u-boot {
 +    reg = <0x 0x0010>;
 +    label = "u-boot";
 +    };
 +    

Re: fsl_espi errors on v5.7.15

2020-08-26 Thread Chris Packham
(adding Nicholas)

On 26/08/20 6:38 pm, Heiner Kallweit wrote:
> On 26.08.2020 08:07, Chris Packham wrote:
>> On 26/08/20 1:48 pm, Chris Packham wrote:
>>> On 26/08/20 10:22 am, Chris Packham wrote:
 On 25/08/20 7:22 pm, Heiner Kallweit wrote:

 
> I've been staring at spi-fsl-espi.c for while now and I think I've
>> identified a couple of deficiencies that may or may not be related
>> to my
>> issue.
>>
>> First I think the 'Transfer done but SPIE_DON isn't set' message
>> can be
>> generated spuriously. In fsl_espi_irq() we read the ESPI_SPIE
>> register.
>> We also write back to it to clear the current events. We re-read it in
>> fsl_espi_cpu_irq() and complain when SPIE_DON is not set. But we can
>> naturally end up in that situation if we're doing a large read.
>> Consider
>> the messages for reading a block of data from a spi-nor chip
>>
>>     tx = READ_OP + ADDR
>>     rx = data
>>
>> We setup the transfer and pump out the tx_buf. The first interrupt
>> goes
>> off and ESPI_SPIE has SPIM_DON and SPIM_RXT set. We empty the rx fifo,
>> clear ESPI_SPIE and wait for the next interrupt. The next interrupt
>> fires and this time we have ESPI_SPIE with just SPIM_RXT set. This
>> continues until we've received all the data and we finish with
>> ESPI_SPIE
>> having only SPIM_RXT set. When we re-read it we complain that SPIE_DON
>> isn't set.
>>
>> The other deficiency is that we only get an interrupt when the
>> amount of
>> data in the rx fifo is above FSL_ESPI_RXTHR. If there are fewer than
>> FSL_ESPI_RXTHR left to be received we will never pull them out of
>> the fifo.
>>
> SPIM_DON will trigger an interrupt once the last characters have been
> transferred, and read the remaining characters from the FIFO.
 The T2080RM that I have says the following about the DON bit

 "Last character was transmitted. The last character was transmitted
 and a new command can be written for the next frame."

 That does at least seem to fit with my assertion that it's all about
 the TX direction. But the fact that it doesn't happen all the time
 throws some doubt on it.

> I think the reason I'm seeing some variability is because of how fast
>> (or slow) the interrupts get processed and how fast the spi-nor
>> chip can
>> fill the CPUs rx fifo.
>>
> To rule out timing issues at high bus frequencies I initially asked
> for re-testing at lower frequencies. If you e.g. limit the bus to 1 MHz
> or even less, then timing shouldn't be an issue.
 Yes I've currently got spi-max-frequency = <100>; in my dts. I
 would also expect a slower frequency would fit my "DON is for TX"
 narrative.
> Last relevant functional changes have been done almost 4 years ago.
> And yours is the first such report I see. So question is what could
> be so
> special with your setup that it seems you're the only one being
> affected.
> The scenarios you describe are standard, therefore much more people
> should be affected in case of a driver bug.
 Agreed. But even on my hardware (which may have a latent issue
 despite being in the field for going on 5 years) the issue only
 triggers under some fairly specific circumstances.
> You said that kernel config impacts how frequently the issue happens.
> Therefore question is what's the diff in kernel config, and how could
> the differences be related to SPI.
 It did seem to be somewhat random. Things like CONFIG_PREEMPT have an
 impact but every time I found something that seemed to be having an
 impact I've been able to disprove it. I actually think its about how
 busy the system is which may or may not affect when we get round to
 processing the interrupts.

 I have managed to get the 'Transfer done but SPIE_DON isn't set!' to
 occur on the T2080RDB.

 I've had to add the following to expose the environment as a mtd
 partition

 diff --git a/arch/powerpc/boot/dts/fsl/t208xrdb.dtsi
 b/arch/powerpc/boot/dts/fsl/t208xrdb.dtsi
 index ff87e67c70da..fbf95fc1fd68 100644
 --- a/arch/powerpc/boot/dts/fsl/t208xrdb.dtsi
 +++ b/arch/powerpc/boot/dts/fsl/t208xrdb.dtsi
 @@ -116,6 +116,15 @@ flash@0 {
      compatible = "micron,n25q512ax3",
 "jedec,spi-nor";
      reg = <0>;
      spi-max-frequency = <1000>; /*
 input clock */
 +
 +   partition@u-boot {
 +    reg = <0x 0x0010>;
 +    label = "u-boot";
 +    };
 +    partition@u-boot-env {
 + 

Re: fsl_espi errors on v5.7.15

2020-08-26 Thread Heiner Kallweit
On 26.08.2020 08:07, Chris Packham wrote:
> 
> On 26/08/20 1:48 pm, Chris Packham wrote:
>>
>> On 26/08/20 10:22 am, Chris Packham wrote:
>>> On 25/08/20 7:22 pm, Heiner Kallweit wrote:
>>>
>>> 
 I've been staring at spi-fsl-espi.c for while now and I think I've
> identified a couple of deficiencies that may or may not be related 
> to my
> issue.
>
> First I think the 'Transfer done but SPIE_DON isn't set' message 
> can be
> generated spuriously. In fsl_espi_irq() we read the ESPI_SPIE 
> register.
> We also write back to it to clear the current events. We re-read it in
> fsl_espi_cpu_irq() and complain when SPIE_DON is not set. But we can
> naturally end up in that situation if we're doing a large read. 
> Consider
> the messages for reading a block of data from a spi-nor chip
>
>    tx = READ_OP + ADDR
>    rx = data
>
> We setup the transfer and pump out the tx_buf. The first interrupt 
> goes
> off and ESPI_SPIE has SPIM_DON and SPIM_RXT set. We empty the rx fifo,
> clear ESPI_SPIE and wait for the next interrupt. The next interrupt
> fires and this time we have ESPI_SPIE with just SPIM_RXT set. This
> continues until we've received all the data and we finish with 
> ESPI_SPIE
> having only SPIM_RXT set. When we re-read it we complain that SPIE_DON
> isn't set.
>
> The other deficiency is that we only get an interrupt when the 
> amount of
> data in the rx fifo is above FSL_ESPI_RXTHR. If there are fewer than
> FSL_ESPI_RXTHR left to be received we will never pull them out of 
> the fifo.
>
 SPIM_DON will trigger an interrupt once the last characters have been
 transferred, and read the remaining characters from the FIFO.
>>>
>>> The T2080RM that I have says the following about the DON bit
>>>
>>> "Last character was transmitted. The last character was transmitted 
>>> and a new command can be written for the next frame."
>>>
>>> That does at least seem to fit with my assertion that it's all about 
>>> the TX direction. But the fact that it doesn't happen all the time 
>>> throws some doubt on it.
>>>
 I think the reason I'm seeing some variability is because of how fast
> (or slow) the interrupts get processed and how fast the spi-nor 
> chip can
> fill the CPUs rx fifo.
>
 To rule out timing issues at high bus frequencies I initially asked
 for re-testing at lower frequencies. If you e.g. limit the bus to 1 MHz
 or even less, then timing shouldn't be an issue.
>>> Yes I've currently got spi-max-frequency = <100>; in my dts. I 
>>> would also expect a slower frequency would fit my "DON is for TX" 
>>> narrative.
 Last relevant functional changes have been done almost 4 years ago.
 And yours is the first such report I see. So question is what could 
 be so
 special with your setup that it seems you're the only one being 
 affected.
 The scenarios you describe are standard, therefore much more people
 should be affected in case of a driver bug.
>>> Agreed. But even on my hardware (which may have a latent issue 
>>> despite being in the field for going on 5 years) the issue only 
>>> triggers under some fairly specific circumstances.
 You said that kernel config impacts how frequently the issue happens.
 Therefore question is what's the diff in kernel config, and how could
 the differences be related to SPI.
>>>
>>> It did seem to be somewhat random. Things like CONFIG_PREEMPT have an 
>>> impact but every time I found something that seemed to be having an 
>>> impact I've been able to disprove it. I actually think its about how 
>>> busy the system is which may or may not affect when we get round to 
>>> processing the interrupts.
>>>
>>> I have managed to get the 'Transfer done but SPIE_DON isn't set!' to 
>>> occur on the T2080RDB.
>>>
>>> I've had to add the following to expose the environment as a mtd 
>>> partition
>>>
>>> diff --git a/arch/powerpc/boot/dts/fsl/t208xrdb.dtsi 
>>> b/arch/powerpc/boot/dts/fsl/t208xrdb.dtsi
>>> index ff87e67c70da..fbf95fc1fd68 100644
>>> --- a/arch/powerpc/boot/dts/fsl/t208xrdb.dtsi
>>> +++ b/arch/powerpc/boot/dts/fsl/t208xrdb.dtsi
>>> @@ -116,6 +116,15 @@ flash@0 {
>>>     compatible = "micron,n25q512ax3", 
>>> "jedec,spi-nor";
>>>     reg = <0>;
>>>     spi-max-frequency = <1000>; /* 
>>> input clock */
>>> +
>>> +   partition@u-boot {
>>> +    reg = <0x 0x0010>;
>>> +    label = "u-boot";
>>> +    };
>>> +    partition@u-boot-env {
>>> +    reg = <0x0010 0x0001>;
>>> +    label = "u-boot-env";
>>> +   

Re: fsl_espi errors on v5.7.15

2020-08-26 Thread Chris Packham

On 26/08/20 1:48 pm, Chris Packham wrote:
>
> On 26/08/20 10:22 am, Chris Packham wrote:
>> On 25/08/20 7:22 pm, Heiner Kallweit wrote:
>>
>> 
>>> I've been staring at spi-fsl-espi.c for while now and I think I've
 identified a couple of deficiencies that may or may not be related 
 to my
 issue.

 First I think the 'Transfer done but SPIE_DON isn't set' message 
 can be
 generated spuriously. In fsl_espi_irq() we read the ESPI_SPIE 
 register.
 We also write back to it to clear the current events. We re-read it in
 fsl_espi_cpu_irq() and complain when SPIE_DON is not set. But we can
 naturally end up in that situation if we're doing a large read. 
 Consider
 the messages for reading a block of data from a spi-nor chip

    tx = READ_OP + ADDR
    rx = data

 We setup the transfer and pump out the tx_buf. The first interrupt 
 goes
 off and ESPI_SPIE has SPIM_DON and SPIM_RXT set. We empty the rx fifo,
 clear ESPI_SPIE and wait for the next interrupt. The next interrupt
 fires and this time we have ESPI_SPIE with just SPIM_RXT set. This
 continues until we've received all the data and we finish with 
 ESPI_SPIE
 having only SPIM_RXT set. When we re-read it we complain that SPIE_DON
 isn't set.

 The other deficiency is that we only get an interrupt when the 
 amount of
 data in the rx fifo is above FSL_ESPI_RXTHR. If there are fewer than
 FSL_ESPI_RXTHR left to be received we will never pull them out of 
 the fifo.

>>> SPIM_DON will trigger an interrupt once the last characters have been
>>> transferred, and read the remaining characters from the FIFO.
>>
>> The T2080RM that I have says the following about the DON bit
>>
>> "Last character was transmitted. The last character was transmitted 
>> and a new command can be written for the next frame."
>>
>> That does at least seem to fit with my assertion that it's all about 
>> the TX direction. But the fact that it doesn't happen all the time 
>> throws some doubt on it.
>>
>>> I think the reason I'm seeing some variability is because of how fast
 (or slow) the interrupts get processed and how fast the spi-nor 
 chip can
 fill the CPUs rx fifo.

>>> To rule out timing issues at high bus frequencies I initially asked
>>> for re-testing at lower frequencies. If you e.g. limit the bus to 1 MHz
>>> or even less, then timing shouldn't be an issue.
>> Yes I've currently got spi-max-frequency = <100>; in my dts. I 
>> would also expect a slower frequency would fit my "DON is for TX" 
>> narrative.
>>> Last relevant functional changes have been done almost 4 years ago.
>>> And yours is the first such report I see. So question is what could 
>>> be so
>>> special with your setup that it seems you're the only one being 
>>> affected.
>>> The scenarios you describe are standard, therefore much more people
>>> should be affected in case of a driver bug.
>> Agreed. But even on my hardware (which may have a latent issue 
>> despite being in the field for going on 5 years) the issue only 
>> triggers under some fairly specific circumstances.
>>> You said that kernel config impacts how frequently the issue happens.
>>> Therefore question is what's the diff in kernel config, and how could
>>> the differences be related to SPI.
>>
>> It did seem to be somewhat random. Things like CONFIG_PREEMPT have an 
>> impact but every time I found something that seemed to be having an 
>> impact I've been able to disprove it. I actually think its about how 
>> busy the system is which may or may not affect when we get round to 
>> processing the interrupts.
>>
>> I have managed to get the 'Transfer done but SPIE_DON isn't set!' to 
>> occur on the T2080RDB.
>>
>> I've had to add the following to expose the environment as a mtd 
>> partition
>>
>> diff --git a/arch/powerpc/boot/dts/fsl/t208xrdb.dtsi 
>> b/arch/powerpc/boot/dts/fsl/t208xrdb.dtsi
>> index ff87e67c70da..fbf95fc1fd68 100644
>> --- a/arch/powerpc/boot/dts/fsl/t208xrdb.dtsi
>> +++ b/arch/powerpc/boot/dts/fsl/t208xrdb.dtsi
>> @@ -116,6 +116,15 @@ flash@0 {
>>     compatible = "micron,n25q512ax3", 
>> "jedec,spi-nor";
>>     reg = <0>;
>>     spi-max-frequency = <1000>; /* 
>> input clock */
>> +
>> +   partition@u-boot {
>> +    reg = <0x 0x0010>;
>> +    label = "u-boot";
>> +    };
>> +    partition@u-boot-env {
>> +    reg = <0x0010 0x0001>;
>> +    label = "u-boot-env";
>> +    };
>>     };
>>     };
>>
>> And I'm using the following script to poke at the environment 
>> (warning if 

Re: fsl_espi errors on v5.7.15

2020-08-25 Thread Chris Packham

On 26/08/20 10:22 am, Chris Packham wrote:
> On 25/08/20 7:22 pm, Heiner Kallweit wrote:
>
> 
>> I've been staring at spi-fsl-espi.c for while now and I think I've
>>> identified a couple of deficiencies that may or may not be related 
>>> to my
>>> issue.
>>>
>>> First I think the 'Transfer done but SPIE_DON isn't set' message can be
>>> generated spuriously. In fsl_espi_irq() we read the ESPI_SPIE register.
>>> We also write back to it to clear the current events. We re-read it in
>>> fsl_espi_cpu_irq() and complain when SPIE_DON is not set. But we can
>>> naturally end up in that situation if we're doing a large read. 
>>> Consider
>>> the messages for reading a block of data from a spi-nor chip
>>>
>>>    tx = READ_OP + ADDR
>>>    rx = data
>>>
>>> We setup the transfer and pump out the tx_buf. The first interrupt goes
>>> off and ESPI_SPIE has SPIM_DON and SPIM_RXT set. We empty the rx fifo,
>>> clear ESPI_SPIE and wait for the next interrupt. The next interrupt
>>> fires and this time we have ESPI_SPIE with just SPIM_RXT set. This
>>> continues until we've received all the data and we finish with 
>>> ESPI_SPIE
>>> having only SPIM_RXT set. When we re-read it we complain that SPIE_DON
>>> isn't set.
>>>
>>> The other deficiency is that we only get an interrupt when the 
>>> amount of
>>> data in the rx fifo is above FSL_ESPI_RXTHR. If there are fewer than
>>> FSL_ESPI_RXTHR left to be received we will never pull them out of 
>>> the fifo.
>>>
>> SPIM_DON will trigger an interrupt once the last characters have been
>> transferred, and read the remaining characters from the FIFO.
>
> The T2080RM that I have says the following about the DON bit
>
> "Last character was transmitted. The last character was transmitted 
> and a new command can be written for the next frame."
>
> That does at least seem to fit with my assertion that it's all about 
> the TX direction. But the fact that it doesn't happen all the time 
> throws some doubt on it.
>
>> I think the reason I'm seeing some variability is because of how fast
>>> (or slow) the interrupts get processed and how fast the spi-nor chip 
>>> can
>>> fill the CPUs rx fifo.
>>>
>> To rule out timing issues at high bus frequencies I initially asked
>> for re-testing at lower frequencies. If you e.g. limit the bus to 1 MHz
>> or even less, then timing shouldn't be an issue.
> Yes I've currently got spi-max-frequency = <100>; in my dts. I 
> would also expect a slower frequency would fit my "DON is for TX" 
> narrative.
>> Last relevant functional changes have been done almost 4 years ago.
>> And yours is the first such report I see. So question is what could 
>> be so
>> special with your setup that it seems you're the only one being 
>> affected.
>> The scenarios you describe are standard, therefore much more people
>> should be affected in case of a driver bug.
> Agreed. But even on my hardware (which may have a latent issue despite 
> being in the field for going on 5 years) the issue only triggers under 
> some fairly specific circumstances.
>> You said that kernel config impacts how frequently the issue happens.
>> Therefore question is what's the diff in kernel config, and how could
>> the differences be related to SPI.
>
> It did seem to be somewhat random. Things like CONFIG_PREEMPT have an 
> impact but every time I found something that seemed to be having an 
> impact I've been able to disprove it. I actually think its about how 
> busy the system is which may or may not affect when we get round to 
> processing the interrupts.
>
> I have managed to get the 'Transfer done but SPIE_DON isn't set!' to 
> occur on the T2080RDB.
>
> I've had to add the following to expose the environment as a mtd 
> partition
>
> diff --git a/arch/powerpc/boot/dts/fsl/t208xrdb.dtsi 
> b/arch/powerpc/boot/dts/fsl/t208xrdb.dtsi
> index ff87e67c70da..fbf95fc1fd68 100644
> --- a/arch/powerpc/boot/dts/fsl/t208xrdb.dtsi
> +++ b/arch/powerpc/boot/dts/fsl/t208xrdb.dtsi
> @@ -116,6 +116,15 @@ flash@0 {
>     compatible = "micron,n25q512ax3", 
> "jedec,spi-nor";
>     reg = <0>;
>     spi-max-frequency = <1000>; /* 
> input clock */
> +
> +   partition@u-boot {
> +    reg = <0x 0x0010>;
> +    label = "u-boot";
> +    };
> +    partition@u-boot-env {
> +    reg = <0x0010 0x0001>;
> +    label = "u-boot-env";
> +    };
>     };
>     };
>
> And I'm using the following script to poke at the environment (warning 
> if anyone does try this and the bug hits it can render your u-boot 
> environment invalid).
>
> cat flash/fw_env_test.sh
> #!/bin/sh
>
> generate_fw_env_config()
> {
>   cat 

Re: fsl_espi errors on v5.7.15

2020-08-25 Thread Chris Packham
On 25/08/20 7:22 pm, Heiner Kallweit wrote:


> I've been staring at spi-fsl-espi.c for while now and I think I've
>> identified a couple of deficiencies that may or may not be related to my
>> issue.
>>
>> First I think the 'Transfer done but SPIE_DON isn't set' message can be
>> generated spuriously. In fsl_espi_irq() we read the ESPI_SPIE register.
>> We also write back to it to clear the current events. We re-read it in
>> fsl_espi_cpu_irq() and complain when SPIE_DON is not set. But we can
>> naturally end up in that situation if we're doing a large read. Consider
>> the messages for reading a block of data from a spi-nor chip
>>
>>    tx = READ_OP + ADDR
>>    rx = data
>>
>> We setup the transfer and pump out the tx_buf. The first interrupt goes
>> off and ESPI_SPIE has SPIM_DON and SPIM_RXT set. We empty the rx fifo,
>> clear ESPI_SPIE and wait for the next interrupt. The next interrupt
>> fires and this time we have ESPI_SPIE with just SPIM_RXT set. This
>> continues until we've received all the data and we finish with ESPI_SPIE
>> having only SPIM_RXT set. When we re-read it we complain that SPIE_DON
>> isn't set.
>>
>> The other deficiency is that we only get an interrupt when the amount of
>> data in the rx fifo is above FSL_ESPI_RXTHR. If there are fewer than
>> FSL_ESPI_RXTHR left to be received we will never pull them out of the fifo.
>>
> SPIM_DON will trigger an interrupt once the last characters have been
> transferred, and read the remaining characters from the FIFO.

The T2080RM that I have says the following about the DON bit

"Last character was transmitted. The last character was transmitted and 
a new command can be written for the next frame."

That does at least seem to fit with my assertion that it's all about the 
TX direction. But the fact that it doesn't happen all the time throws 
some doubt on it.

> I think the reason I'm seeing some variability is because of how fast
>> (or slow) the interrupts get processed and how fast the spi-nor chip can
>> fill the CPUs rx fifo.
>>
> To rule out timing issues at high bus frequencies I initially asked
> for re-testing at lower frequencies. If you e.g. limit the bus to 1 MHz
> or even less, then timing shouldn't be an issue.
Yes I've currently got spi-max-frequency = <100>; in my dts. I would 
also expect a slower frequency would fit my "DON is for TX" narrative.
> Last relevant functional changes have been done almost 4 years ago.
> And yours is the first such report I see. So question is what could be so
> special with your setup that it seems you're the only one being affected.
> The scenarios you describe are standard, therefore much more people
> should be affected in case of a driver bug.
Agreed. But even on my hardware (which may have a latent issue despite 
being in the field for going on 5 years) the issue only triggers under 
some fairly specific circumstances.
> You said that kernel config impacts how frequently the issue happens.
> Therefore question is what's the diff in kernel config, and how could
> the differences be related to SPI.

It did seem to be somewhat random. Things like CONFIG_PREEMPT have an 
impact but every time I found something that seemed to be having an 
impact I've been able to disprove it. I actually think its about how 
busy the system is which may or may not affect when we get round to 
processing the interrupts.

I have managed to get the 'Transfer done but SPIE_DON isn't set!' to 
occur on the T2080RDB.

I've had to add the following to expose the environment as a mtd partition

diff --git a/arch/powerpc/boot/dts/fsl/t208xrdb.dtsi 
b/arch/powerpc/boot/dts/fsl/t208xrdb.dtsi
index ff87e67c70da..fbf95fc1fd68 100644
--- a/arch/powerpc/boot/dts/fsl/t208xrdb.dtsi
+++ b/arch/powerpc/boot/dts/fsl/t208xrdb.dtsi
@@ -116,6 +116,15 @@ flash@0 {
     compatible = "micron,n25q512ax3", 
"jedec,spi-nor";
     reg = <0>;
     spi-max-frequency = <1000>; /* 
input clock */
+
+   partition@u-boot {
+    reg = <0x 0x0010>;
+    label = "u-boot";
+    };
+    partition@u-boot-env {
+    reg = <0x0010 0x0001>;
+    label = "u-boot-env";
+    };
     };
     };

And I'm using the following script to poke at the environment (warning 
if anyone does try this and the bug hits it can render your u-boot 
environment invalid).

cat flash/fw_env_test.sh
#!/bin/sh

generate_fw_env_config()
{
   cat /proc/mtd | sed 's/[:"]//g' | while read dev size erasesize name ; do
  echo "$dev $size $erasesize $name"
  [ "$name" = "u-boot-env" ] && echo "/dev/$dev 0x 0x2000 
$erasesize" >/flash/fw_env.config
   done
}

cycles=10
[ $# -ge 

Re: fsl_espi errors on v5.7.15

2020-08-25 Thread Heiner Kallweit
On 25.08.2020 05:54, Chris Packham wrote:
> 
> On 25/08/20 10:04 am, Chris Packham wrote:
>>
>> On 20/08/20 9:08 am, Chris Packham wrote:
>>>
>>> On 19/08/20 6:15 pm, Heiner Kallweit wrote:
 On 19.08.2020 00:44, Chris Packham wrote:
> Hi Again,
>
> On 17/08/20 9:09 am, Chris Packham wrote:
>
>> On 14/08/20 6:19 pm, Heiner Kallweit wrote:
>>> On 14.08.2020 04:48, Chris Packham wrote:
 Hi,

 I'm seeing a problem with accessing spi-nor after upgrading a T2081
 based system to linux v5.7.15

 For this board u-boot and the u-boot environment live on spi-nor.

 When I use fw_setenv from userspace I get the following kernel logs

 # fw_setenv foo=1
 fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
 fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
 fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
 fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
 fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
 fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
 fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
 fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
 fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
 fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
 fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
 fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
 fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
 fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
 fsl_espi ffe11.spi: Transfer done but rx/tx fifo's aren't 
 empty!
 fsl_espi ffe11.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32
 fsl_espi ffe11.spi: Transfer done but rx/tx fifo's aren't 
 empty!
 fsl_espi ffe11.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32
 fsl_espi ffe11.spi: Transfer done but rx/tx fifo's aren't 
 empty!
 fsl_espi ffe11.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32
 ...

>>> This error reporting doesn't exist yet in 4.4. So you may have an 
>>> issue
>>> under 4.4 too, it's just not reported.
>>> Did you verify that under 4.4 fw_setenv actually has an effect?
>> Just double checked and yes under 4.4 the setting does get saved.
 If I run fw_printenv (before getting it into a bad state) it is 
 able to
 display the content of the boards u-boot environment.

>>> This might indicate an issue with spi being locked. I've seen 
>>> related
>>> questions, just use the search engine of your choice and check for
>>> fw_setenv and locked.
>> I'm running a version of fw_setenv which includes
>> https://gitlab.denx.de/u-boot/u-boot/-/commit/db820159 so it 
>> shouldn't
>> be locking things unnecessarily.
 If been unsuccessful in producing a setup for bisecting the 
 issue. I do
 know the issue doesn't occur on the old 4.4.x based kernel but 
 that's
 probably not much help.

 Any pointers on what the issue (and/or solution) might be.
> I finally managed to get our board running with a vanilla kernel. With
> corenet64_smp_defconfig I occasionally see
>
>     fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
>
> other than the message things seem to be working.
>
> With a custom defconfig I see
>
>     fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
>     fsl_espi ffe11.spi: Transfer done but rx/tx fifo's aren't 
> empty!
>     fsl_espi ffe11.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32
>     ...
>
> and access to the spi-nor does not work until the board is reset.
>
> I'll try and pick apart the differences between the two defconfigs.
>>>
>>> I now think my earlier testing is invalid. I have seen the problem 
>>> with either defconfig if I try hard enough. I had convinced myself 
>>> that the problem was CONFIG_PREEMPT but that was before I found 
>>> boot-to-boot differences with the same kernel.
>>>
>>> It's possible that I'm chasing multiple issues with the same symptom.
>>>
>>> The error I'm most concerned with is in the sequence
>>> 1. boot with old image
>>> 2. write environment
>>> 3. boot with new image
>>> 4. write environment
>>> 5. write fails and environment is corrupted
>>>
>>> After I recover the system things sometimes seem fine. Until I repeat 
>>> the sequence above.
>>>
 Also relevant may be:
 - Which dts are you using?
>>> Custom but based heavily on the t2080rdb.
 - What's the spi-nor type, and at which frequency are you operating it?
>>> The board has several alternate parts for the spi-nor so the dts just 
>>> specifies 

Re: fsl_espi errors on v5.7.15

2020-08-24 Thread Chris Packham

On 25/08/20 10:04 am, Chris Packham wrote:
>
> On 20/08/20 9:08 am, Chris Packham wrote:
>>
>> On 19/08/20 6:15 pm, Heiner Kallweit wrote:
>>> On 19.08.2020 00:44, Chris Packham wrote:
 Hi Again,

 On 17/08/20 9:09 am, Chris Packham wrote:

> On 14/08/20 6:19 pm, Heiner Kallweit wrote:
>> On 14.08.2020 04:48, Chris Packham wrote:
>>> Hi,
>>>
>>> I'm seeing a problem with accessing spi-nor after upgrading a T2081
>>> based system to linux v5.7.15
>>>
>>> For this board u-boot and the u-boot environment live on spi-nor.
>>>
>>> When I use fw_setenv from userspace I get the following kernel logs
>>>
>>> # fw_setenv foo=1
>>> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
>>> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
>>> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
>>> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
>>> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
>>> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
>>> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
>>> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
>>> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
>>> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
>>> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
>>> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
>>> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
>>> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
>>> fsl_espi ffe11.spi: Transfer done but rx/tx fifo's aren't 
>>> empty!
>>> fsl_espi ffe11.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32
>>> fsl_espi ffe11.spi: Transfer done but rx/tx fifo's aren't 
>>> empty!
>>> fsl_espi ffe11.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32
>>> fsl_espi ffe11.spi: Transfer done but rx/tx fifo's aren't 
>>> empty!
>>> fsl_espi ffe11.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32
>>> ...
>>>
>> This error reporting doesn't exist yet in 4.4. So you may have an 
>> issue
>> under 4.4 too, it's just not reported.
>> Did you verify that under 4.4 fw_setenv actually has an effect?
> Just double checked and yes under 4.4 the setting does get saved.
>>> If I run fw_printenv (before getting it into a bad state) it is 
>>> able to
>>> display the content of the boards u-boot environment.
>>>
>> This might indicate an issue with spi being locked. I've seen 
>> related
>> questions, just use the search engine of your choice and check for
>> fw_setenv and locked.
> I'm running a version of fw_setenv which includes
> https://gitlab.denx.de/u-boot/u-boot/-/commit/db820159 so it 
> shouldn't
> be locking things unnecessarily.
>>> If been unsuccessful in producing a setup for bisecting the 
>>> issue. I do
>>> know the issue doesn't occur on the old 4.4.x based kernel but 
>>> that's
>>> probably not much help.
>>>
>>> Any pointers on what the issue (and/or solution) might be.
 I finally managed to get our board running with a vanilla kernel. With
 corenet64_smp_defconfig I occasionally see

     fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!

 other than the message things seem to be working.

 With a custom defconfig I see

     fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
     fsl_espi ffe11.spi: Transfer done but rx/tx fifo's aren't 
 empty!
     fsl_espi ffe11.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32
     ...

 and access to the spi-nor does not work until the board is reset.

 I'll try and pick apart the differences between the two defconfigs.
>>
>> I now think my earlier testing is invalid. I have seen the problem 
>> with either defconfig if I try hard enough. I had convinced myself 
>> that the problem was CONFIG_PREEMPT but that was before I found 
>> boot-to-boot differences with the same kernel.
>>
>> It's possible that I'm chasing multiple issues with the same symptom.
>>
>> The error I'm most concerned with is in the sequence
>> 1. boot with old image
>> 2. write environment
>> 3. boot with new image
>> 4. write environment
>> 5. write fails and environment is corrupted
>>
>> After I recover the system things sometimes seem fine. Until I repeat 
>> the sequence above.
>>
>>> Also relevant may be:
>>> - Which dts are you using?
>> Custom but based heavily on the t2080rdb.
>>> - What's the spi-nor type, and at which frequency are you operating it?
>> The board has several alternate parts for the spi-nor so the dts just 
>> specifies compatible = "jedec,spi-nor" the actual chip detected on 
>> the board I have is "n25q032a (4096 Kbytes)". The dts sets 
>> spi-max-frequency = <1000> I haven't 

Re: fsl_espi errors on v5.7.15

2020-08-24 Thread Chris Packham

On 20/08/20 9:08 am, Chris Packham wrote:
>
> On 19/08/20 6:15 pm, Heiner Kallweit wrote:
>> On 19.08.2020 00:44, Chris Packham wrote:
>>> Hi Again,
>>>
>>> On 17/08/20 9:09 am, Chris Packham wrote:
>>>
 On 14/08/20 6:19 pm, Heiner Kallweit wrote:
> On 14.08.2020 04:48, Chris Packham wrote:
>> Hi,
>>
>> I'm seeing a problem with accessing spi-nor after upgrading a T2081
>> based system to linux v5.7.15
>>
>> For this board u-boot and the u-boot environment live on spi-nor.
>>
>> When I use fw_setenv from userspace I get the following kernel logs
>>
>> # fw_setenv foo=1
>> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
>> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
>> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
>> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
>> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
>> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
>> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
>> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
>> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
>> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
>> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
>> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
>> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
>> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
>> fsl_espi ffe11.spi: Transfer done but rx/tx fifo's aren't empty!
>> fsl_espi ffe11.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32
>> fsl_espi ffe11.spi: Transfer done but rx/tx fifo's aren't empty!
>> fsl_espi ffe11.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32
>> fsl_espi ffe11.spi: Transfer done but rx/tx fifo's aren't empty!
>> fsl_espi ffe11.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32
>> ...
>>
> This error reporting doesn't exist yet in 4.4. So you may have an 
> issue
> under 4.4 too, it's just not reported.
> Did you verify that under 4.4 fw_setenv actually has an effect?
 Just double checked and yes under 4.4 the setting does get saved.
>> If I run fw_printenv (before getting it into a bad state) it is 
>> able to
>> display the content of the boards u-boot environment.
>>
> This might indicate an issue with spi being locked. I've seen related
> questions, just use the search engine of your choice and check for
> fw_setenv and locked.
 I'm running a version of fw_setenv which includes
 https://gitlab.denx.de/u-boot/u-boot/-/commit/db820159 so it shouldn't
 be locking things unnecessarily.
>> If been unsuccessful in producing a setup for bisecting the 
>> issue. I do
>> know the issue doesn't occur on the old 4.4.x based kernel but 
>> that's
>> probably not much help.
>>
>> Any pointers on what the issue (and/or solution) might be.
>>> I finally managed to get our board running with a vanilla kernel. With
>>> corenet64_smp_defconfig I occasionally see
>>>
>>>     fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
>>>
>>> other than the message things seem to be working.
>>>
>>> With a custom defconfig I see
>>>
>>>     fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
>>>     fsl_espi ffe11.spi: Transfer done but rx/tx fifo's aren't 
>>> empty!
>>>     fsl_espi ffe11.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32
>>>     ...
>>>
>>> and access to the spi-nor does not work until the board is reset.
>>>
>>> I'll try and pick apart the differences between the two defconfigs.
>
> I now think my earlier testing is invalid. I have seen the problem 
> with either defconfig if I try hard enough. I had convinced myself 
> that the problem was CONFIG_PREEMPT but that was before I found 
> boot-to-boot differences with the same kernel.
>
> It's possible that I'm chasing multiple issues with the same symptom.
>
> The error I'm most concerned with is in the sequence
> 1. boot with old image
> 2. write environment
> 3. boot with new image
> 4. write environment
> 5. write fails and environment is corrupted
>
> After I recover the system things sometimes seem fine. Until I repeat 
> the sequence above.
>
>> Also relevant may be:
>> - Which dts are you using?
> Custom but based heavily on the t2080rdb.
>> - What's the spi-nor type, and at which frequency are you operating it?
> The board has several alternate parts for the spi-nor so the dts just 
> specifies compatible = "jedec,spi-nor" the actual chip detected on the 
> board I have is "n25q032a (4096 Kbytes)". The dts sets 
> spi-max-frequency = <1000> I haven't measured the actual frequency 
> on the bus.
>> - Does the issue still happen if you lower the frequency?
> I did play around with the frequency initially but I should probably 
> give that another 

Re: fsl_espi errors on v5.7.15

2020-08-19 Thread Chris Packham

On 19/08/20 6:15 pm, Heiner Kallweit wrote:
> On 19.08.2020 00:44, Chris Packham wrote:
>> Hi Again,
>>
>> On 17/08/20 9:09 am, Chris Packham wrote:
>>
>>> On 14/08/20 6:19 pm, Heiner Kallweit wrote:
 On 14.08.2020 04:48, Chris Packham wrote:
> Hi,
>
> I'm seeing a problem with accessing spi-nor after upgrading a T2081
> based system to linux v5.7.15
>
> For this board u-boot and the u-boot environment live on spi-nor.
>
> When I use fw_setenv from userspace I get the following kernel logs
>
> # fw_setenv foo=1
> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
> fsl_espi ffe11.spi: Transfer done but rx/tx fifo's aren't empty!
> fsl_espi ffe11.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32
> fsl_espi ffe11.spi: Transfer done but rx/tx fifo's aren't empty!
> fsl_espi ffe11.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32
> fsl_espi ffe11.spi: Transfer done but rx/tx fifo's aren't empty!
> fsl_espi ffe11.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32
> ...
>
 This error reporting doesn't exist yet in 4.4. So you may have an issue
 under 4.4 too, it's just not reported.
 Did you verify that under 4.4 fw_setenv actually has an effect?
>>> Just double checked and yes under 4.4 the setting does get saved.
> If I run fw_printenv (before getting it into a bad state) it is able to
> display the content of the boards u-boot environment.
>
 This might indicate an issue with spi being locked. I've seen related
 questions, just use the search engine of your choice and check for
 fw_setenv and locked.
>>> I'm running a version of fw_setenv which includes
>>> https://gitlab.denx.de/u-boot/u-boot/-/commit/db820159 so it shouldn't
>>> be locking things unnecessarily.
> If been unsuccessful in producing a setup for bisecting the issue. I do
> know the issue doesn't occur on the old 4.4.x based kernel but that's
> probably not much help.
>
> Any pointers on what the issue (and/or solution) might be.
>> I finally managed to get our board running with a vanilla kernel. With
>> corenet64_smp_defconfig I occasionally see
>>
>>     fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
>>
>> other than the message things seem to be working.
>>
>> With a custom defconfig I see
>>
>>     fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
>>     fsl_espi ffe11.spi: Transfer done but rx/tx fifo's aren't empty!
>>     fsl_espi ffe11.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32
>>     ...
>>
>> and access to the spi-nor does not work until the board is reset.
>>
>> I'll try and pick apart the differences between the two defconfigs.

I now think my earlier testing is invalid. I have seen the problem with 
either defconfig if I try hard enough. I had convinced myself that the 
problem was CONFIG_PREEMPT but that was before I found boot-to-boot 
differences with the same kernel.

It's possible that I'm chasing multiple issues with the same symptom.

The error I'm most concerned with is in the sequence
1. boot with old image
2. write environment
3. boot with new image
4. write environment
5. write fails and environment is corrupted

After I recover the system things sometimes seem fine. Until I repeat 
the sequence above.

> Also relevant may be:
> - Which dts are you using?
Custom but based heavily on the t2080rdb.
> - What's the spi-nor type, and at which frequency are you operating it?
The board has several alternate parts for the spi-nor so the dts just 
specifies compatible = "jedec,spi-nor" the actual chip detected on the 
board I have is "n25q032a (4096 Kbytes)". The dts sets spi-max-frequency 
= <1000> I haven't measured the actual frequency on the bus.
> - Does the issue still happen if you lower the frequency?
I did play around with the frequency initially but I should probably 
give that another go now that I have a better reproduction method.

Re: fsl_espi errors on v5.7.15

2020-08-19 Thread Heiner Kallweit
On 19.08.2020 00:44, Chris Packham wrote:
> Hi Again,
> 
> On 17/08/20 9:09 am, Chris Packham wrote:
> 
>>
>> On 14/08/20 6:19 pm, Heiner Kallweit wrote:
>>> On 14.08.2020 04:48, Chris Packham wrote:
 Hi,

 I'm seeing a problem with accessing spi-nor after upgrading a T2081
 based system to linux v5.7.15

 For this board u-boot and the u-boot environment live on spi-nor.

 When I use fw_setenv from userspace I get the following kernel logs

 # fw_setenv foo=1
 fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
 fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
 fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
 fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
 fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
 fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
 fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
 fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
 fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
 fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
 fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
 fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
 fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
 fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
 fsl_espi ffe11.spi: Transfer done but rx/tx fifo's aren't empty!
 fsl_espi ffe11.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32
 fsl_espi ffe11.spi: Transfer done but rx/tx fifo's aren't empty!
 fsl_espi ffe11.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32
 fsl_espi ffe11.spi: Transfer done but rx/tx fifo's aren't empty!
 fsl_espi ffe11.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32
 ...

>>> This error reporting doesn't exist yet in 4.4. So you may have an issue
>>> under 4.4 too, it's just not reported.
>>> Did you verify that under 4.4 fw_setenv actually has an effect?
>> Just double checked and yes under 4.4 the setting does get saved.
 If I run fw_printenv (before getting it into a bad state) it is able to
 display the content of the boards u-boot environment.

>>> This might indicate an issue with spi being locked. I've seen related
>>> questions, just use the search engine of your choice and check for
>>> fw_setenv and locked.
>> I'm running a version of fw_setenv which includes 
>> https://gitlab.denx.de/u-boot/u-boot/-/commit/db820159 so it shouldn't 
>> be locking things unnecessarily.
 If been unsuccessful in producing a setup for bisecting the issue. I do
 know the issue doesn't occur on the old 4.4.x based kernel but that's
 probably not much help.

 Any pointers on what the issue (and/or solution) might be.
> 
> I finally managed to get our board running with a vanilla kernel. With 
> corenet64_smp_defconfig I occasionally see
> 
>    fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
> 
> other than the message things seem to be working.
> 
> With a custom defconfig I see
> 
>    fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
>    fsl_espi ffe11.spi: Transfer done but rx/tx fifo's aren't empty!
>    fsl_espi ffe11.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32
>    ...
> 
> and access to the spi-nor does not work until the board is reset.
> 
> I'll try and pick apart the differences between the two defconfigs.
> 
Also relevant may be:
- Which dts are you using?
- What's the spi-nor type, and at which frequency are you operating it?
- Does the issue still happen if you lower the frequency?


Re: fsl_espi errors on v5.7.15

2020-08-18 Thread Chris Packham
Hi Again,

On 17/08/20 9:09 am, Chris Packham wrote:

>
> On 14/08/20 6:19 pm, Heiner Kallweit wrote:
>> On 14.08.2020 04:48, Chris Packham wrote:
>>> Hi,
>>>
>>> I'm seeing a problem with accessing spi-nor after upgrading a T2081
>>> based system to linux v5.7.15
>>>
>>> For this board u-boot and the u-boot environment live on spi-nor.
>>>
>>> When I use fw_setenv from userspace I get the following kernel logs
>>>
>>> # fw_setenv foo=1
>>> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
>>> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
>>> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
>>> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
>>> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
>>> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
>>> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
>>> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
>>> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
>>> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
>>> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
>>> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
>>> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
>>> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
>>> fsl_espi ffe11.spi: Transfer done but rx/tx fifo's aren't empty!
>>> fsl_espi ffe11.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32
>>> fsl_espi ffe11.spi: Transfer done but rx/tx fifo's aren't empty!
>>> fsl_espi ffe11.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32
>>> fsl_espi ffe11.spi: Transfer done but rx/tx fifo's aren't empty!
>>> fsl_espi ffe11.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32
>>> ...
>>>
>> This error reporting doesn't exist yet in 4.4. So you may have an issue
>> under 4.4 too, it's just not reported.
>> Did you verify that under 4.4 fw_setenv actually has an effect?
> Just double checked and yes under 4.4 the setting does get saved.
>>> If I run fw_printenv (before getting it into a bad state) it is able to
>>> display the content of the boards u-boot environment.
>>>
>> This might indicate an issue with spi being locked. I've seen related
>> questions, just use the search engine of your choice and check for
>> fw_setenv and locked.
> I'm running a version of fw_setenv which includes 
> https://gitlab.denx.de/u-boot/u-boot/-/commit/db820159 so it shouldn't 
> be locking things unnecessarily.
>>> If been unsuccessful in producing a setup for bisecting the issue. I do
>>> know the issue doesn't occur on the old 4.4.x based kernel but that's
>>> probably not much help.
>>>
>>> Any pointers on what the issue (and/or solution) might be.

I finally managed to get our board running with a vanilla kernel. With 
corenet64_smp_defconfig I occasionally see

   fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!

other than the message things seem to be working.

With a custom defconfig I see

   fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
   fsl_espi ffe11.spi: Transfer done but rx/tx fifo's aren't empty!
   fsl_espi ffe11.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32
   ...

and access to the spi-nor does not work until the board is reset.

I'll try and pick apart the differences between the two defconfigs.


Re: fsl_espi errors on v5.7.15

2020-08-16 Thread Chris Packham

On 14/08/20 6:19 pm, Heiner Kallweit wrote:
> On 14.08.2020 04:48, Chris Packham wrote:
>> Hi,
>>
>> I'm seeing a problem with accessing spi-nor after upgrading a T2081
>> based system to linux v5.7.15
>>
>> For this board u-boot and the u-boot environment live on spi-nor.
>>
>> When I use fw_setenv from userspace I get the following kernel logs
>>
>> # fw_setenv foo=1
>> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
>> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
>> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
>> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
>> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
>> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
>> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
>> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
>> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
>> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
>> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
>> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
>> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
>> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
>> fsl_espi ffe11.spi: Transfer done but rx/tx fifo's aren't empty!
>> fsl_espi ffe11.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32
>> fsl_espi ffe11.spi: Transfer done but rx/tx fifo's aren't empty!
>> fsl_espi ffe11.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32
>> fsl_espi ffe11.spi: Transfer done but rx/tx fifo's aren't empty!
>> fsl_espi ffe11.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32
>> ...
>>
> This error reporting doesn't exist yet in 4.4. So you may have an issue
> under 4.4 too, it's just not reported.
> Did you verify that under 4.4 fw_setenv actually has an effect?
Just double checked and yes under 4.4 the setting does get saved.
>> If I run fw_printenv (before getting it into a bad state) it is able to
>> display the content of the boards u-boot environment.
>>
> This might indicate an issue with spi being locked. I've seen related
> questions, just use the search engine of your choice and check for
> fw_setenv and locked.
I'm running a version of fw_setenv which includes 
https://gitlab.denx.de/u-boot/u-boot/-/commit/db820159 so it shouldn't 
be locking things unnecessarily.
>> If been unsuccessful in producing a setup for bisecting the issue. I do
>> know the issue doesn't occur on the old 4.4.x based kernel but that's
>> probably not much help.
>>
>> Any pointers on what the issue (and/or solution) might be.
>>
>> Thanks,
>> Chris
>>
> Heiner

Re: fsl_espi errors on v5.7.15

2020-08-14 Thread Heiner Kallweit
On 14.08.2020 04:48, Chris Packham wrote:
> Hi,
> 
> I'm seeing a problem with accessing spi-nor after upgrading a T2081 
> based system to linux v5.7.15
> 
> For this board u-boot and the u-boot environment live on spi-nor.
> 
> When I use fw_setenv from userspace I get the following kernel logs
> 
> # fw_setenv foo=1
> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
> fsl_espi ffe11.spi: Transfer done but rx/tx fifo's aren't empty!
> fsl_espi ffe11.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32
> fsl_espi ffe11.spi: Transfer done but rx/tx fifo's aren't empty!
> fsl_espi ffe11.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32
> fsl_espi ffe11.spi: Transfer done but rx/tx fifo's aren't empty!
> fsl_espi ffe11.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32
> ...
> 

This error reporting doesn't exist yet in 4.4. So you may have an issue
under 4.4 too, it's just not reported.
Did you verify that under 4.4 fw_setenv actually has an effect?

> If I run fw_printenv (before getting it into a bad state) it is able to 
> display the content of the boards u-boot environment.
> 

This might indicate an issue with spi being locked. I've seen related
questions, just use the search engine of your choice and check for
fw_setenv and locked.

> If been unsuccessful in producing a setup for bisecting the issue. I do 
> know the issue doesn't occur on the old 4.4.x based kernel but that's 
> probably not much help.
> 
> Any pointers on what the issue (and/or solution) might be.
> 
> Thanks,
> Chris
> 

Heiner


fsl_espi errors on v5.7.15

2020-08-13 Thread Chris Packham
Hi,

I'm seeing a problem with accessing spi-nor after upgrading a T2081 
based system to linux v5.7.15

For this board u-boot and the u-boot environment live on spi-nor.

When I use fw_setenv from userspace I get the following kernel logs

# fw_setenv foo=1
fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
fsl_espi ffe11.spi: Transfer done but rx/tx fifo's aren't empty!
fsl_espi ffe11.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32
fsl_espi ffe11.spi: Transfer done but rx/tx fifo's aren't empty!
fsl_espi ffe11.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32
fsl_espi ffe11.spi: Transfer done but rx/tx fifo's aren't empty!
fsl_espi ffe11.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32
...

If I run fw_printenv (before getting it into a bad state) it is able to 
display the content of the boards u-boot environment.

If been unsuccessful in producing a setup for bisecting the issue. I do 
know the issue doesn't occur on the old 4.4.x based kernel but that's 
probably not much help.

Any pointers on what the issue (and/or solution) might be.

Thanks,
Chris