Re: [PATCH] i2c: at91: fix write transfers by clearing pending interrupt first

2015-10-21 Thread Peter Rosin
On 2015-10-20 15:27, Ludovic Desroches wrote:
> On Mon, Oct 19, 2015 at 12:49:03PM +0200, Peter Rosin wrote:
>> On 2015-10-19 10:51, Ludovic Desroches wrote:
>>> Hi Peter,
>>>
>>> On Fri, Oct 16, 2015 at 11:08:42AM +0200, Peter Rosin wrote:
 On 2015-10-16 01:47, Peter Rosin wrote:
> On 2015-10-14 07:43, Ludovic Desroches wrote:
>> On Tue, Oct 13, 2015 at 08:01:34PM +0200, Peter Rosin wrote:
>>> On 2015-10-13 18:47, Cyrille Pitchen wrote:
 Le 13/10/2015 17:19, Peter Rosin a écrit :
> On 2015-10-13 16:21, Ludovic Desroches wrote:
>>>
>>> [...]
>>>
> I have started to get this when I run with this patch:
>
> [   73.31] at91_i2c f0014000.i2c: RXRDY still set!
> [  198.20] at91_i2c f0014000.i2c: RXRDY still set!
> [  509.88] at91_i2c f0014000.i2c: RXRDY still set!
> [  615.75] at91_i2c f0014000.i2c: RXRDY still set!
> [  617.75] at91_i2c f0014000.i2c: RXRDY still set!
> [ 1766.64] at91_i2c f0014000.i2c: RXRDY still set!
> [ 2035.38] at91_i2c f0014000.i2c: RXRDY still set!
> [ 2227.19] at91_i2c f0014000.i2c: RXRDY still set!
> [ 2313.10] at91_i2c f0014000.i2c: RXRDY still set!
>
> My USB serial dongle was hung which was why I didn't notice until just 
> now.
>
> This is probably not when communication with the eeprom though, and 
> certainly not
> writing to it, but perhpaps when polling the temperature (using the jc42 
> driver).
> I'll investigate further in the morning to see if I can pinpoint it.

 Yep, it's the jc42 accesses that triggers this (to the same chip as the
 eeprom, but a different block of transistors I suppose).

 Looking at the i2c bus, the accesses normally go like this:

 [0.00] S 0x18 W 0x05 S 0x18 R 0xc1 0xbe NACK P
 [0.000521] S 0x18 W 0x04 S 0x18 R 0x00 0x00 NACK P
 [0.001024] S 0x18 W 0x03 S 0x18 R 0x00 0x00 NACK P
 [0.001524] S 0x18 W 0x02 S 0x18 R 0x00 0x00 NACK P
 [0.196991] S 0x18 W 0x05 S 0x18 R 0xc1 0xbe NACK P
 [0.197514] S 0x18 W 0x04 S 0x18 R 0x00 0x00 NACK P
 [0.198019] S 0x18 W 0x03 S 0x18 R 0x00 0x00 NACK P
 [0.198520] S 0x18 W 0x02 S 0x18 R 0x00 0x00 NACK P

 I.e. chunks of four transfers every ~200 ms (I removed the 1Hz rate
 limiter in the jc42 driver to get more frequent incidents).

 But when the diagnostic (RXRDY still set!) is output it continues
 with this:

 [0.399755] S 0x18 W 0x05 S 0x18 R 0xc1 0xbe NACK P
 [0.404998] S 0x18 W 0x04 S 0x18 R 0x00 0x00 NACK P
 [0.405508] S 0x18 W 0x03 S 0x18 R 0x00 0x00 NACK P
 [0.406008] S 0x18 W 0x02 S 0x18 R 0x00 0x00 NACK P

 Notice the ~5 ms delay (about the time it should take to output
 the diagnostic at 115200 baud) after the access to register 0x05
 at 0.399755.

 This is the only thing that I can observe on the bus, so it appears
 to be harmless.

 It appears that the i2c access at 0.399755 finds the TWI
 registers in an odd state, but nothing from the access at
 0.198520 appears to have gone wrong. Is this a race? Anyway,
 the diagnostic is pretty frequent and annoying. printk_once?
>>>
>>> I'll try to reproduce it on my side. The only issue you have is having
>>> the message about RXRDY? I mean no issue with i2c transfers?
>>
>> Exactly, the only issue is the message, the bus looks good and transfers
>> work as they should AFAICT.
>>> It is not the possible bug we had in mind, this bug will prevent the
>>> master device to release the i2c bus. It will stop the transfer but
>>> without sending a stop on the bus.
>>
>> Agreed, this is not about the extra frame caused by the spurious write
>> to the THR register. This is something else.
>>
>> One suspicion is that the driver gets an unexpected irq from its own
>> NACK (the one that it puts out to end the read) and races with the
>> expected interrupt at TXCOMP?
>>
> 
> I have discussed with the designer of this IP. So the NACK flag is only
> used for 'real' nack not protocol ones.
> 
> Concerning the read issue, it have attached a patch (apply it on top of
> Cyrille's patch). Could you have a try? I have reproduced your issue only one
> time so it's hard for me to tell if it works well or not.

Since you removed the dev_err call, I don't really know if the underlying
issue is solved or not (if there ever was any).

Reading RHR in the if at the top of at91_twi_read_next_byte() never happens
for me (I added a witness and left it running overnight), so that
seems like an unrelated change.

Checking RXRDY before TXCOMP|NACK seems about equal, but what do I know?

But the "problem" is gone of course, no flood of messages, but I can't tell
if this is just papering over or a fix.

Cheers,
Peter
--
To unsubscribe from this list: send the line "unsubscribe linux-i2c" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] i2c: at91: fix write transfers by clearing pending interrupt first

2015-10-21 Thread Ludovic Desroches
On Wed, Oct 21, 2015 at 09:42:40AM +0200, Peter Rosin wrote:
> On 2015-10-21 09:21, Peter Rosin wrote:
> > On 2015-10-20 15:27, Ludovic Desroches wrote:
> >> On Mon, Oct 19, 2015 at 12:49:03PM +0200, Peter Rosin wrote:
> >>> On 2015-10-19 10:51, Ludovic Desroches wrote:
>  Hi Peter,
> 
>  On Fri, Oct 16, 2015 at 11:08:42AM +0200, Peter Rosin wrote:
> > On 2015-10-16 01:47, Peter Rosin wrote:
> >> On 2015-10-14 07:43, Ludovic Desroches wrote:
> >>> On Tue, Oct 13, 2015 at 08:01:34PM +0200, Peter Rosin wrote:
>  On 2015-10-13 18:47, Cyrille Pitchen wrote:
> > Le 13/10/2015 17:19, Peter Rosin a écrit :
> >> On 2015-10-13 16:21, Ludovic Desroches wrote:
> 
>  [...]
> 
> >> I have started to get this when I run with this patch:
> >>
> >> [   73.31] at91_i2c f0014000.i2c: RXRDY still set!
> >> [  198.20] at91_i2c f0014000.i2c: RXRDY still set!
> >> [  509.88] at91_i2c f0014000.i2c: RXRDY still set!
> >> [  615.75] at91_i2c f0014000.i2c: RXRDY still set!
> >> [  617.75] at91_i2c f0014000.i2c: RXRDY still set!
> >> [ 1766.64] at91_i2c f0014000.i2c: RXRDY still set!
> >> [ 2035.38] at91_i2c f0014000.i2c: RXRDY still set!
> >> [ 2227.19] at91_i2c f0014000.i2c: RXRDY still set!
> >> [ 2313.10] at91_i2c f0014000.i2c: RXRDY still set!
> >>
> >> My USB serial dongle was hung which was why I didn't notice until just 
> >> now.
> >>
> >> This is probably not when communication with the eeprom though, and 
> >> certainly not
> >> writing to it, but perhpaps when polling the temperature (using the 
> >> jc42 driver).
> >> I'll investigate further in the morning to see if I can pinpoint it.
> >
> > Yep, it's the jc42 accesses that triggers this (to the same chip as the
> > eeprom, but a different block of transistors I suppose).
> >
> > Looking at the i2c bus, the accesses normally go like this:
> >
> > [0.00] S 0x18 W 0x05 S 0x18 R 0xc1 0xbe NACK P
> > [0.000521] S 0x18 W 0x04 S 0x18 R 0x00 0x00 NACK P
> > [0.001024] S 0x18 W 0x03 S 0x18 R 0x00 0x00 NACK P
> > [0.001524] S 0x18 W 0x02 S 0x18 R 0x00 0x00 NACK P
> > [0.196991] S 0x18 W 0x05 S 0x18 R 0xc1 0xbe NACK P
> > [0.197514] S 0x18 W 0x04 S 0x18 R 0x00 0x00 NACK P
> > [0.198019] S 0x18 W 0x03 S 0x18 R 0x00 0x00 NACK P
> > [0.198520] S 0x18 W 0x02 S 0x18 R 0x00 0x00 NACK P
> >
> > I.e. chunks of four transfers every ~200 ms (I removed the 1Hz rate
> > limiter in the jc42 driver to get more frequent incidents).
> >
> > But when the diagnostic (RXRDY still set!) is output it continues
> > with this:
> >
> > [0.399755] S 0x18 W 0x05 S 0x18 R 0xc1 0xbe NACK P
> > [0.404998] S 0x18 W 0x04 S 0x18 R 0x00 0x00 NACK P
> > [0.405508] S 0x18 W 0x03 S 0x18 R 0x00 0x00 NACK P
> > [0.406008] S 0x18 W 0x02 S 0x18 R 0x00 0x00 NACK P
> >
> > Notice the ~5 ms delay (about the time it should take to output
> > the diagnostic at 115200 baud) after the access to register 0x05
> > at 0.399755.
> >
> > This is the only thing that I can observe on the bus, so it appears
> > to be harmless.
> >
> > It appears that the i2c access at 0.399755 finds the TWI
> > registers in an odd state, but nothing from the access at
> > 0.198520 appears to have gone wrong. Is this a race? Anyway,
> > the diagnostic is pretty frequent and annoying. printk_once?
> 
>  I'll try to reproduce it on my side. The only issue you have is having
>  the message about RXRDY? I mean no issue with i2c transfers?
> >>>
> >>> Exactly, the only issue is the message, the bus looks good and transfers
> >>> work as they should AFAICT.
>  It is not the possible bug we had in mind, this bug will prevent the
>  master device to release the i2c bus. It will stop the transfer but
>  without sending a stop on the bus.
> >>>
> >>> Agreed, this is not about the extra frame caused by the spurious write
> >>> to the THR register. This is something else.
> >>>
> >>> One suspicion is that the driver gets an unexpected irq from its own
> >>> NACK (the one that it puts out to end the read) and races with the
> >>> expected interrupt at TXCOMP?
> >>>
> >>
> >> I have discussed with the designer of this IP. So the NACK flag is only
> >> used for 'real' nack not protocol ones.
> >>
> >> Concerning the read issue, it have attached a patch (apply it on top of
> >> Cyrille's patch). Could you have a try? I have reproduced your issue only 
> >> one
> >> time so it's hard for me to tell if it works well or not.
> > 
> > Since you removed the dev_err call, I don't really know if the underlying
> > issue is solved or not (if there ever was any).
> > 
> > Reading RHR in the if at the top of at91_twi_read_next_byte() never happens
> > for me (I added a witness and left it running overnight), so that
> > seems 

Re: [PATCH] i2c: at91: fix write transfers by clearing pending interrupt first

2015-10-21 Thread Peter Rosin
On 2015-10-21 09:21, Peter Rosin wrote:
> On 2015-10-20 15:27, Ludovic Desroches wrote:
>> On Mon, Oct 19, 2015 at 12:49:03PM +0200, Peter Rosin wrote:
>>> On 2015-10-19 10:51, Ludovic Desroches wrote:
 Hi Peter,

 On Fri, Oct 16, 2015 at 11:08:42AM +0200, Peter Rosin wrote:
> On 2015-10-16 01:47, Peter Rosin wrote:
>> On 2015-10-14 07:43, Ludovic Desroches wrote:
>>> On Tue, Oct 13, 2015 at 08:01:34PM +0200, Peter Rosin wrote:
 On 2015-10-13 18:47, Cyrille Pitchen wrote:
> Le 13/10/2015 17:19, Peter Rosin a écrit :
>> On 2015-10-13 16:21, Ludovic Desroches wrote:

 [...]

>> I have started to get this when I run with this patch:
>>
>> [   73.31] at91_i2c f0014000.i2c: RXRDY still set!
>> [  198.20] at91_i2c f0014000.i2c: RXRDY still set!
>> [  509.88] at91_i2c f0014000.i2c: RXRDY still set!
>> [  615.75] at91_i2c f0014000.i2c: RXRDY still set!
>> [  617.75] at91_i2c f0014000.i2c: RXRDY still set!
>> [ 1766.64] at91_i2c f0014000.i2c: RXRDY still set!
>> [ 2035.38] at91_i2c f0014000.i2c: RXRDY still set!
>> [ 2227.19] at91_i2c f0014000.i2c: RXRDY still set!
>> [ 2313.10] at91_i2c f0014000.i2c: RXRDY still set!
>>
>> My USB serial dongle was hung which was why I didn't notice until just 
>> now.
>>
>> This is probably not when communication with the eeprom though, and 
>> certainly not
>> writing to it, but perhpaps when polling the temperature (using the jc42 
>> driver).
>> I'll investigate further in the morning to see if I can pinpoint it.
>
> Yep, it's the jc42 accesses that triggers this (to the same chip as the
> eeprom, but a different block of transistors I suppose).
>
> Looking at the i2c bus, the accesses normally go like this:
>
> [0.00] S 0x18 W 0x05 S 0x18 R 0xc1 0xbe NACK P
> [0.000521] S 0x18 W 0x04 S 0x18 R 0x00 0x00 NACK P
> [0.001024] S 0x18 W 0x03 S 0x18 R 0x00 0x00 NACK P
> [0.001524] S 0x18 W 0x02 S 0x18 R 0x00 0x00 NACK P
> [0.196991] S 0x18 W 0x05 S 0x18 R 0xc1 0xbe NACK P
> [0.197514] S 0x18 W 0x04 S 0x18 R 0x00 0x00 NACK P
> [0.198019] S 0x18 W 0x03 S 0x18 R 0x00 0x00 NACK P
> [0.198520] S 0x18 W 0x02 S 0x18 R 0x00 0x00 NACK P
>
> I.e. chunks of four transfers every ~200 ms (I removed the 1Hz rate
> limiter in the jc42 driver to get more frequent incidents).
>
> But when the diagnostic (RXRDY still set!) is output it continues
> with this:
>
> [0.399755] S 0x18 W 0x05 S 0x18 R 0xc1 0xbe NACK P
> [0.404998] S 0x18 W 0x04 S 0x18 R 0x00 0x00 NACK P
> [0.405508] S 0x18 W 0x03 S 0x18 R 0x00 0x00 NACK P
> [0.406008] S 0x18 W 0x02 S 0x18 R 0x00 0x00 NACK P
>
> Notice the ~5 ms delay (about the time it should take to output
> the diagnostic at 115200 baud) after the access to register 0x05
> at 0.399755.
>
> This is the only thing that I can observe on the bus, so it appears
> to be harmless.
>
> It appears that the i2c access at 0.399755 finds the TWI
> registers in an odd state, but nothing from the access at
> 0.198520 appears to have gone wrong. Is this a race? Anyway,
> the diagnostic is pretty frequent and annoying. printk_once?

 I'll try to reproduce it on my side. The only issue you have is having
 the message about RXRDY? I mean no issue with i2c transfers?
>>>
>>> Exactly, the only issue is the message, the bus looks good and transfers
>>> work as they should AFAICT.
 It is not the possible bug we had in mind, this bug will prevent the
 master device to release the i2c bus. It will stop the transfer but
 without sending a stop on the bus.
>>>
>>> Agreed, this is not about the extra frame caused by the spurious write
>>> to the THR register. This is something else.
>>>
>>> One suspicion is that the driver gets an unexpected irq from its own
>>> NACK (the one that it puts out to end the read) and races with the
>>> expected interrupt at TXCOMP?
>>>
>>
>> I have discussed with the designer of this IP. So the NACK flag is only
>> used for 'real' nack not protocol ones.
>>
>> Concerning the read issue, it have attached a patch (apply it on top of
>> Cyrille's patch). Could you have a try? I have reproduced your issue only one
>> time so it's hard for me to tell if it works well or not.
> 
> Since you removed the dev_err call, I don't really know if the underlying
> issue is solved or not (if there ever was any).
> 
> Reading RHR in the if at the top of at91_twi_read_next_byte() never happens
> for me (I added a witness and left it running overnight), so that
> seems like an unrelated change.

Ahhh, now I see. Why is it that I have to hit send before I start to realize
things? You have moved the RHR read from at91_do_twi_transfer()...

> Checking RXRDY before TXCOMP|NACK seems about equal, but what do I know?

...and there is no else 

Re: [PATCH] i2c: at91: fix write transfers by clearing pending interrupt first

2015-10-20 Thread Ludovic Desroches
On Mon, Oct 19, 2015 at 12:49:03PM +0200, Peter Rosin wrote:
> On 2015-10-19 10:51, Ludovic Desroches wrote:
> > Hi Peter,
> > 
> > On Fri, Oct 16, 2015 at 11:08:42AM +0200, Peter Rosin wrote:
> >> On 2015-10-16 01:47, Peter Rosin wrote:
> >>> On 2015-10-14 07:43, Ludovic Desroches wrote:
>  On Tue, Oct 13, 2015 at 08:01:34PM +0200, Peter Rosin wrote:
> > On 2015-10-13 18:47, Cyrille Pitchen wrote:
> >> Le 13/10/2015 17:19, Peter Rosin a écrit :
> >>> On 2015-10-13 16:21, Ludovic Desroches wrote:
> > 
> > [...]
> > 
> >>> I have started to get this when I run with this patch:
> >>>
> >>> [   73.31] at91_i2c f0014000.i2c: RXRDY still set!
> >>> [  198.20] at91_i2c f0014000.i2c: RXRDY still set!
> >>> [  509.88] at91_i2c f0014000.i2c: RXRDY still set!
> >>> [  615.75] at91_i2c f0014000.i2c: RXRDY still set!
> >>> [  617.75] at91_i2c f0014000.i2c: RXRDY still set!
> >>> [ 1766.64] at91_i2c f0014000.i2c: RXRDY still set!
> >>> [ 2035.38] at91_i2c f0014000.i2c: RXRDY still set!
> >>> [ 2227.19] at91_i2c f0014000.i2c: RXRDY still set!
> >>> [ 2313.10] at91_i2c f0014000.i2c: RXRDY still set!
> >>>
> >>> My USB serial dongle was hung which was why I didn't notice until just 
> >>> now.
> >>>
> >>> This is probably not when communication with the eeprom though, and 
> >>> certainly not
> >>> writing to it, but perhpaps when polling the temperature (using the jc42 
> >>> driver).
> >>> I'll investigate further in the morning to see if I can pinpoint it.
> >>
> >> Yep, it's the jc42 accesses that triggers this (to the same chip as the
> >> eeprom, but a different block of transistors I suppose).
> >>
> >> Looking at the i2c bus, the accesses normally go like this:
> >>
> >> [0.00] S 0x18 W 0x05 S 0x18 R 0xc1 0xbe NACK P
> >> [0.000521] S 0x18 W 0x04 S 0x18 R 0x00 0x00 NACK P
> >> [0.001024] S 0x18 W 0x03 S 0x18 R 0x00 0x00 NACK P
> >> [0.001524] S 0x18 W 0x02 S 0x18 R 0x00 0x00 NACK P
> >> [0.196991] S 0x18 W 0x05 S 0x18 R 0xc1 0xbe NACK P
> >> [0.197514] S 0x18 W 0x04 S 0x18 R 0x00 0x00 NACK P
> >> [0.198019] S 0x18 W 0x03 S 0x18 R 0x00 0x00 NACK P
> >> [0.198520] S 0x18 W 0x02 S 0x18 R 0x00 0x00 NACK P
> >>
> >> I.e. chunks of four transfers every ~200 ms (I removed the 1Hz rate
> >> limiter in the jc42 driver to get more frequent incidents).
> >>
> >> But when the diagnostic (RXRDY still set!) is output it continues
> >> with this:
> >>
> >> [0.399755] S 0x18 W 0x05 S 0x18 R 0xc1 0xbe NACK P
> >> [0.404998] S 0x18 W 0x04 S 0x18 R 0x00 0x00 NACK P
> >> [0.405508] S 0x18 W 0x03 S 0x18 R 0x00 0x00 NACK P
> >> [0.406008] S 0x18 W 0x02 S 0x18 R 0x00 0x00 NACK P
> >>
> >> Notice the ~5 ms delay (about the time it should take to output
> >> the diagnostic at 115200 baud) after the access to register 0x05
> >> at 0.399755.
> >>
> >> This is the only thing that I can observe on the bus, so it appears
> >> to be harmless.
> >>
> >> It appears that the i2c access at 0.399755 finds the TWI
> >> registers in an odd state, but nothing from the access at
> >> 0.198520 appears to have gone wrong. Is this a race? Anyway,
> >> the diagnostic is pretty frequent and annoying. printk_once?
> > 
> > I'll try to reproduce it on my side. The only issue you have is having
> > the message about RXRDY? I mean no issue with i2c transfers?
> 
> Exactly, the only issue is the message, the bus looks good and transfers
> work as they should AFAICT.
> > It is not the possible bug we had in mind, this bug will prevent the
> > master device to release the i2c bus. It will stop the transfer but
> > without sending a stop on the bus.
> 
> Agreed, this is not about the extra frame caused by the spurious write
> to the THR register. This is something else.
> 
> One suspicion is that the driver gets an unexpected irq from its own
> NACK (the one that it puts out to end the read) and races with the
> expected interrupt at TXCOMP?
>

I have discussed with the designer of this IP. So the NACK flag is only
used for 'real' nack not protocol ones.

Concerning the read issue, it have attached a patch (apply it on top of
Cyrille's patch). Could you have a try? I have reproduced your issue only one
time so it's hard for me to tell if it works well or not.

Regards

Ludovic
diff --git a/drivers/i2c/busses/i2c-at91.c b/drivers/i2c/busses/i2c-at91.c
index 94c087b..05e3d8e 100644
--- a/drivers/i2c/busses/i2c-at91.c
+++ b/drivers/i2c/busses/i2c-at91.c
@@ -347,8 +347,14 @@ error:
 
 static void at91_twi_read_next_byte(struct at91_twi_dev *dev)
 {
-	if (!dev->buf_len)
+	/*
+	 * If we are in this case, it means there is garbage data in RHR, so
+	 * delete them.
+	 */
+	if (!dev->buf_len) {
+		at91_twi_read(dev, AT91_TWI_RHR);
 		return;
+	}
 
 	/* 8bit read works with and without FIFO */
 	*dev->buf = readb_relaxed(dev->base + AT91_TWI_RHR);
@@ -465,6 +471,8 @@ static irqreturn_t atmel_twi_interrupt(int irq, void *dev_id)
 
 	if (!irqstatus)
 		return IRQ_NONE;
+	if (irqstatus & AT91_TWI_RXRDY)
+		

Re: [PATCH] i2c: at91: fix write transfers by clearing pending interrupt first

2015-10-19 Thread Ludovic Desroches
Hi Peter,

On Fri, Oct 16, 2015 at 11:08:42AM +0200, Peter Rosin wrote:
> On 2015-10-16 01:47, Peter Rosin wrote:
> > On 2015-10-14 07:43, Ludovic Desroches wrote:
> >> On Tue, Oct 13, 2015 at 08:01:34PM +0200, Peter Rosin wrote:
> >>> On 2015-10-13 18:47, Cyrille Pitchen wrote:
>  Le 13/10/2015 17:19, Peter Rosin a écrit :
> > On 2015-10-13 16:21, Ludovic Desroches wrote:

[...]

> > I have started to get this when I run with this patch:
> > 
> > [   73.31] at91_i2c f0014000.i2c: RXRDY still set!
> > [  198.20] at91_i2c f0014000.i2c: RXRDY still set!
> > [  509.88] at91_i2c f0014000.i2c: RXRDY still set!
> > [  615.75] at91_i2c f0014000.i2c: RXRDY still set!
> > [  617.75] at91_i2c f0014000.i2c: RXRDY still set!
> > [ 1766.64] at91_i2c f0014000.i2c: RXRDY still set!
> > [ 2035.38] at91_i2c f0014000.i2c: RXRDY still set!
> > [ 2227.19] at91_i2c f0014000.i2c: RXRDY still set!
> > [ 2313.10] at91_i2c f0014000.i2c: RXRDY still set!
> > 
> > My USB serial dongle was hung which was why I didn't notice until just now.
> > 
> > This is probably not when communication with the eeprom though, and 
> > certainly not
> > writing to it, but perhpaps when polling the temperature (using the jc42 
> > driver).
> > I'll investigate further in the morning to see if I can pinpoint it.
> 
> Yep, it's the jc42 accesses that triggers this (to the same chip as the
> eeprom, but a different block of transistors I suppose).
> 
> Looking at the i2c bus, the accesses normally go like this:
> 
> [0.00] S 0x18 W 0x05 S 0x18 R 0xc1 0xbe NACK P
> [0.000521] S 0x18 W 0x04 S 0x18 R 0x00 0x00 NACK P
> [0.001024] S 0x18 W 0x03 S 0x18 R 0x00 0x00 NACK P
> [0.001524] S 0x18 W 0x02 S 0x18 R 0x00 0x00 NACK P
> [0.196991] S 0x18 W 0x05 S 0x18 R 0xc1 0xbe NACK P
> [0.197514] S 0x18 W 0x04 S 0x18 R 0x00 0x00 NACK P
> [0.198019] S 0x18 W 0x03 S 0x18 R 0x00 0x00 NACK P
> [0.198520] S 0x18 W 0x02 S 0x18 R 0x00 0x00 NACK P
> 
> I.e. chunks of four transfers every ~200 ms (I removed the 1Hz rate
> limiter in the jc42 driver to get more frequent incidents).
> 
> But when the diagnostic (RXRDY still set!) is output it continues
> with this:
> 
> [0.399755] S 0x18 W 0x05 S 0x18 R 0xc1 0xbe NACK P
> [0.404998] S 0x18 W 0x04 S 0x18 R 0x00 0x00 NACK P
> [0.405508] S 0x18 W 0x03 S 0x18 R 0x00 0x00 NACK P
> [0.406008] S 0x18 W 0x02 S 0x18 R 0x00 0x00 NACK P
> 
> Notice the ~5 ms delay (about the time it should take to output
> the diagnostic at 115200 baud) after the access to register 0x05
> at 0.399755.
> 
> This is the only thing that I can observe on the bus, so it appears
> to be harmless.
> 
> It appears that the i2c access at 0.399755 finds the TWI
> registers in an odd state, but nothing from the access at
> 0.198520 appears to have gone wrong. Is this a race? Anyway,
> the diagnostic is pretty frequent and annoying. printk_once?

I'll try to reproduce it on my side. The only issue you have is having
the message about RXRDY? I mean no issue with i2c transfers?

It is not the possible bug we had in mind, this bug will prevent the
master device to release the i2c bus. It will stop the transfer but
without sending a stop on the bus.

Regards

Ludovic
--
To unsubscribe from this list: send the line "unsubscribe linux-i2c" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] i2c: at91: fix write transfers by clearing pending interrupt first

2015-10-19 Thread Peter Rosin
On 2015-10-19 10:51, Ludovic Desroches wrote:
> Hi Peter,
> 
> On Fri, Oct 16, 2015 at 11:08:42AM +0200, Peter Rosin wrote:
>> On 2015-10-16 01:47, Peter Rosin wrote:
>>> On 2015-10-14 07:43, Ludovic Desroches wrote:
 On Tue, Oct 13, 2015 at 08:01:34PM +0200, Peter Rosin wrote:
> On 2015-10-13 18:47, Cyrille Pitchen wrote:
>> Le 13/10/2015 17:19, Peter Rosin a écrit :
>>> On 2015-10-13 16:21, Ludovic Desroches wrote:
> 
> [...]
> 
>>> I have started to get this when I run with this patch:
>>>
>>> [   73.31] at91_i2c f0014000.i2c: RXRDY still set!
>>> [  198.20] at91_i2c f0014000.i2c: RXRDY still set!
>>> [  509.88] at91_i2c f0014000.i2c: RXRDY still set!
>>> [  615.75] at91_i2c f0014000.i2c: RXRDY still set!
>>> [  617.75] at91_i2c f0014000.i2c: RXRDY still set!
>>> [ 1766.64] at91_i2c f0014000.i2c: RXRDY still set!
>>> [ 2035.38] at91_i2c f0014000.i2c: RXRDY still set!
>>> [ 2227.19] at91_i2c f0014000.i2c: RXRDY still set!
>>> [ 2313.10] at91_i2c f0014000.i2c: RXRDY still set!
>>>
>>> My USB serial dongle was hung which was why I didn't notice until just now.
>>>
>>> This is probably not when communication with the eeprom though, and 
>>> certainly not
>>> writing to it, but perhpaps when polling the temperature (using the jc42 
>>> driver).
>>> I'll investigate further in the morning to see if I can pinpoint it.
>>
>> Yep, it's the jc42 accesses that triggers this (to the same chip as the
>> eeprom, but a different block of transistors I suppose).
>>
>> Looking at the i2c bus, the accesses normally go like this:
>>
>> [0.00] S 0x18 W 0x05 S 0x18 R 0xc1 0xbe NACK P
>> [0.000521] S 0x18 W 0x04 S 0x18 R 0x00 0x00 NACK P
>> [0.001024] S 0x18 W 0x03 S 0x18 R 0x00 0x00 NACK P
>> [0.001524] S 0x18 W 0x02 S 0x18 R 0x00 0x00 NACK P
>> [0.196991] S 0x18 W 0x05 S 0x18 R 0xc1 0xbe NACK P
>> [0.197514] S 0x18 W 0x04 S 0x18 R 0x00 0x00 NACK P
>> [0.198019] S 0x18 W 0x03 S 0x18 R 0x00 0x00 NACK P
>> [0.198520] S 0x18 W 0x02 S 0x18 R 0x00 0x00 NACK P
>>
>> I.e. chunks of four transfers every ~200 ms (I removed the 1Hz rate
>> limiter in the jc42 driver to get more frequent incidents).
>>
>> But when the diagnostic (RXRDY still set!) is output it continues
>> with this:
>>
>> [0.399755] S 0x18 W 0x05 S 0x18 R 0xc1 0xbe NACK P
>> [0.404998] S 0x18 W 0x04 S 0x18 R 0x00 0x00 NACK P
>> [0.405508] S 0x18 W 0x03 S 0x18 R 0x00 0x00 NACK P
>> [0.406008] S 0x18 W 0x02 S 0x18 R 0x00 0x00 NACK P
>>
>> Notice the ~5 ms delay (about the time it should take to output
>> the diagnostic at 115200 baud) after the access to register 0x05
>> at 0.399755.
>>
>> This is the only thing that I can observe on the bus, so it appears
>> to be harmless.
>>
>> It appears that the i2c access at 0.399755 finds the TWI
>> registers in an odd state, but nothing from the access at
>> 0.198520 appears to have gone wrong. Is this a race? Anyway,
>> the diagnostic is pretty frequent and annoying. printk_once?
> 
> I'll try to reproduce it on my side. The only issue you have is having
> the message about RXRDY? I mean no issue with i2c transfers?

Exactly, the only issue is the message, the bus looks good and transfers
work as they should AFAICT.

> It is not the possible bug we had in mind, this bug will prevent the
> master device to release the i2c bus. It will stop the transfer but
> without sending a stop on the bus.

Agreed, this is not about the extra frame caused by the spurious write
to the THR register. This is something else.

One suspicion is that the driver gets an unexpected irq from its own
NACK (the one that it puts out to end the read) and races with the
expected interrupt at TXCOMP?

Cheers,
Peter

--
To unsubscribe from this list: send the line "unsubscribe linux-i2c" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] i2c: at91: fix write transfers by clearing pending interrupt first

2015-10-16 Thread Peter Rosin
On 2015-10-16 01:47, Peter Rosin wrote:
> On 2015-10-14 07:43, Ludovic Desroches wrote:
>> On Tue, Oct 13, 2015 at 08:01:34PM +0200, Peter Rosin wrote:
>>> On 2015-10-13 18:47, Cyrille Pitchen wrote:
 Le 13/10/2015 17:19, Peter Rosin a écrit :
> On 2015-10-13 16:21, Ludovic Desroches wrote:
>> From: Cyrille Pitchen 
>>
>> In some cases a NACK interrupt may be pending in the Status Register (SR)
>> as a result of a previous transfer. However at91_do_twi_transfer() did 
>> not
>> read the SR to clear pending interruptions before starting a new 
>> transfer.
>> Hence a NACK interrupt rose as soon as it was enabled again at the I2C
>> controller level, resulting in a wrong sequence of operations and strange
>> patterns of behaviour on the I2C bus, such as a clock stretch followed by
>> a restart of the transfer.
>>
>> This first issue occurred with both DMA and PIO write transfers.
>>
>> Also when a NACK error was detected during a PIO write transfer, the
>> interrupt handler used to wrongly start a new transfer by writing into 
>> the
>> Transmit Holding Register (THR). Then the I2C slave was likely to reply
>> with a second NACK.
>>
>> This second issue is fixed in atmel_twi_interrupt() by handling the TXRDY
>> status bit only if both the TXCOMP and NACK status bits are cleared.
>>
>> Tested with a at24 eeprom on sama5d36ek board running a linux-4.1-at91
>> kernel image. Adapted to linux-next.
>>
>> Signed-off-by: Cyrille Pitchen 
>> Fixes: 93563a6a71bb ("i2c: at91: fix a race condition when using the DMA 
>> controller")
>> Reported-by: Peter Rosin 
>> Signed-off-by: Ludovic Desroches 
>> Cc: sta...@vger.kernel.org #4.1
>
> The regression is gone with this patch (vanilla 4.2), thank you!
>
> However, looking at the bus, there are two NACKs after each
> successful chunk of memory written by the eeprom driver.
>
> Looking at the eeprom driver, I expect this on the bus:
>
> S 0x50 0x00 "hello there guys" P
> S 0x50 NACK P
> delay for at least 1 ms (since the eeprom driver has a msleep(1) call).
> S 0x50 NACK P
> delay for at least 1 ms
> ...
> ...
> S 0x50 NACK P
> delay for at least 1 ms
> S 0x50 0x10 "and girls\n" P
>
> This is not what I observe on the bus, most of the time there is a
> second NACK immediately following the first. I suspect that it is
> the i2c bus driver that somehow confuses itself and reissues the
> command for some reason?
>
> But this behavior has been there since the beginning, so it's probably
> orthogonal, and killing the data corrupting regression is much more
> important to me than fussing over a surplus failed transfer. Hence
>
> Tested-by: Peter Rosin 
>
> Cheers,
> Peter
>

 Hi Peter,

 sama5d3x and sama5d4x don't support the so called "Alternative Command 
 mode"
 whereas sama5d2x do. The Alternative Command mode comes with a new hardware
 mechanism inside the I2C controller which locks the transmission of data on
 the I2C bus when a NACK is detected. It means that even if a data is 
 written
 into the THR, the I2C controller doesn't push this data on the I2C bus but
 retains the data in the THR (and its associated FIFO for sama5d2x and 
 future
 SoCs) until the driver unlocks the transmitter by writing the LOCKCLR (Lock
 Clear) bit in the Control Register. Then and only then, the transmitter 
 outputs
 pending data again.
 This new mechanism was introduced to cope with an unwanted DMA write into 
 the
 THR after a NACK. Indeed, as I've tried to explain in my patch, when a 
 first
 NACK is detected, the I2C controller sets the TXCOMP, NACK and TXRDY bits
 alltogether in the Status Register. However setting the TXRDY bit also 
 triggers
 the DMA controller to write the next data into the THR. Pitifully, WITHOUT 
 the
 new lock mechanism, writing into the THR starts a new I2C frame. Since the
 eeprom is likely not to be ready yet, it replies by a second NACK. So you
 see on the scope two consecutive NACKs.

 On sama5d3x and sama5d4x, which do not support this lock mechanism, you are
 likely to see a successful write transfer followed by two NACKs then a 
 delay
 and finally a new successful write transfer. This is the case 2b:

 1 - A successfull write transfer is completed.
 2 - The at24 driver immediately tries to perform the next write transfer...
 3 - ... but the eeprom is not ready yet and replies with a first NACK.
 4 - The I2C controller detects this NACK and sets the NACK, TXCOMP and 
 TXRDY
 bits in its Status Register 

Re: [PATCH] i2c: at91: fix write transfers by clearing pending interrupt first

2015-10-13 Thread Ludovic Desroches
On Tue, Oct 13, 2015 at 08:01:34PM +0200, Peter Rosin wrote:
> On 2015-10-13 18:47, Cyrille Pitchen wrote:
> > Le 13/10/2015 17:19, Peter Rosin a écrit :
> >> On 2015-10-13 16:21, Ludovic Desroches wrote:
> >>> From: Cyrille Pitchen 
> >>>
> >>> In some cases a NACK interrupt may be pending in the Status Register (SR)
> >>> as a result of a previous transfer. However at91_do_twi_transfer() did not
> >>> read the SR to clear pending interruptions before starting a new transfer.
> >>> Hence a NACK interrupt rose as soon as it was enabled again at the I2C
> >>> controller level, resulting in a wrong sequence of operations and strange
> >>> patterns of behaviour on the I2C bus, such as a clock stretch followed by
> >>> a restart of the transfer.
> >>>
> >>> This first issue occurred with both DMA and PIO write transfers.
> >>>
> >>> Also when a NACK error was detected during a PIO write transfer, the
> >>> interrupt handler used to wrongly start a new transfer by writing into the
> >>> Transmit Holding Register (THR). Then the I2C slave was likely to reply
> >>> with a second NACK.
> >>>
> >>> This second issue is fixed in atmel_twi_interrupt() by handling the TXRDY
> >>> status bit only if both the TXCOMP and NACK status bits are cleared.
> >>>
> >>> Tested with a at24 eeprom on sama5d36ek board running a linux-4.1-at91
> >>> kernel image. Adapted to linux-next.
> >>>
> >>> Signed-off-by: Cyrille Pitchen 
> >>> Fixes: 93563a6a71bb ("i2c: at91: fix a race condition when using the DMA 
> >>> controller")
> >>> Reported-by: Peter Rosin 
> >>> Signed-off-by: Ludovic Desroches 
> >>> Cc: sta...@vger.kernel.org #4.1
> >>
> >> The regression is gone with this patch (vanilla 4.2), thank you!
> >>
> >> However, looking at the bus, there are two NACKs after each
> >> successful chunk of memory written by the eeprom driver.
> >>
> >> Looking at the eeprom driver, I expect this on the bus:
> >>
> >> S 0x50 0x00 "hello there guys" P
> >> S 0x50 NACK P
> >> delay for at least 1 ms (since the eeprom driver has a msleep(1) call).
> >> S 0x50 NACK P
> >> delay for at least 1 ms
> >> ...
> >> ...
> >> S 0x50 NACK P
> >> delay for at least 1 ms
> >> S 0x50 0x10 "and girls\n" P
> >>
> >> This is not what I observe on the bus, most of the time there is a
> >> second NACK immediately following the first. I suspect that it is
> >> the i2c bus driver that somehow confuses itself and reissues the
> >> command for some reason?
> >>
> >> But this behavior has been there since the beginning, so it's probably
> >> orthogonal, and killing the data corrupting regression is much more
> >> important to me than fussing over a surplus failed transfer. Hence
> >>
> >> Tested-by: Peter Rosin 
> >>
> >> Cheers,
> >> Peter
> >>
> > 
> > Hi Peter,
> > 
> > sama5d3x and sama5d4x don't support the so called "Alternative Command mode"
> > whereas sama5d2x do. The Alternative Command mode comes with a new hardware
> > mechanism inside the I2C controller which locks the transmission of data on
> > the I2C bus when a NACK is detected. It means that even if a data is written
> > into the THR, the I2C controller doesn't push this data on the I2C bus but
> > retains the data in the THR (and its associated FIFO for sama5d2x and future
> > SoCs) until the driver unlocks the transmitter by writing the LOCKCLR (Lock
> > Clear) bit in the Control Register. Then and only then, the transmitter 
> > outputs
> > pending data again.
> > This new mechanism was introduced to cope with an unwanted DMA write into 
> > the
> > THR after a NACK. Indeed, as I've tried to explain in my patch, when a first
> > NACK is detected, the I2C controller sets the TXCOMP, NACK and TXRDY bits
> > alltogether in the Status Register. However setting the TXRDY bit also 
> > triggers
> > the DMA controller to write the next data into the THR. Pitifully, WITHOUT 
> > the
> > new lock mechanism, writing into the THR starts a new I2C frame. Since the
> > eeprom is likely not to be ready yet, it replies by a second NACK. So you
> > see on the scope two consecutive NACKs.
> > 
> > On sama5d3x and sama5d4x, which do not support this lock mechanism, you are
> > likely to see a successful write transfer followed by two NACKs then a delay
> > and finally a new successful write transfer. This is the case 2b:
> > 
> > 1 - A successfull write transfer is completed.
> > 2 - The at24 driver immediately tries to perform the next write transfer...
> > 3 - ... but the eeprom is not ready yet and replies with a first NACK.
> > 4 - The I2C controller detects this NACK and sets the NACK, TXCOMP and TXRDY
> > bits in its Status Register (and interrupts the CPU).
> > 5 - The DMA controller is triggered by the TXRDY bit to write the next data
> > into the THR.
> > 6 - WITHTOUT the lock mechanism, the I2C controller starts a new frame 
> > because
> >of the write into the THR 

Re: [PATCH] i2c: at91: fix write transfers by clearing pending interrupt first

2015-10-13 Thread Peter Rosin
On 2015-10-13 16:21, Ludovic Desroches wrote:
> From: Cyrille Pitchen 
> 
> In some cases a NACK interrupt may be pending in the Status Register (SR)
> as a result of a previous transfer. However at91_do_twi_transfer() did not
> read the SR to clear pending interruptions before starting a new transfer.
> Hence a NACK interrupt rose as soon as it was enabled again at the I2C
> controller level, resulting in a wrong sequence of operations and strange
> patterns of behaviour on the I2C bus, such as a clock stretch followed by
> a restart of the transfer.
> 
> This first issue occurred with both DMA and PIO write transfers.
> 
> Also when a NACK error was detected during a PIO write transfer, the
> interrupt handler used to wrongly start a new transfer by writing into the
> Transmit Holding Register (THR). Then the I2C slave was likely to reply
> with a second NACK.
> 
> This second issue is fixed in atmel_twi_interrupt() by handling the TXRDY
> status bit only if both the TXCOMP and NACK status bits are cleared.
> 
> Tested with a at24 eeprom on sama5d36ek board running a linux-4.1-at91
> kernel image. Adapted to linux-next.
> 
> Signed-off-by: Cyrille Pitchen 
> Fixes: 93563a6a71bb ("i2c: at91: fix a race condition when using the DMA 
> controller")
> Reported-by: Peter Rosin 
> Signed-off-by: Ludovic Desroches 
> Cc: sta...@vger.kernel.org #4.1

The regression is gone with this patch (vanilla 4.2), thank you!

However, looking at the bus, there are two NACKs after each
successful chunk of memory written by the eeprom driver.

Looking at the eeprom driver, I expect this on the bus:

S 0x50 0x00 "hello there guys" P
S 0x50 NACK P
delay for at least 1 ms (since the eeprom driver has a msleep(1) call).
S 0x50 NACK P
delay for at least 1 ms
...
...
S 0x50 NACK P
delay for at least 1 ms
S 0x50 0x10 "and girls\n" P

This is not what I observe on the bus, most of the time there is a
second NACK immediately following the first. I suspect that it is
the i2c bus driver that somehow confuses itself and reissues the
command for some reason?

But this behavior has been there since the beginning, so it's probably
orthogonal, and killing the data corrupting regression is much more
important to me than fussing over a surplus failed transfer. Hence

Tested-by: Peter Rosin 

Cheers,
Peter

--
To unsubscribe from this list: send the line "unsubscribe linux-i2c" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] i2c: at91: fix write transfers by clearing pending interrupt first

2015-10-13 Thread Cyrille Pitchen
Le 13/10/2015 17:19, Peter Rosin a écrit :
> On 2015-10-13 16:21, Ludovic Desroches wrote:
>> From: Cyrille Pitchen 
>>
>> In some cases a NACK interrupt may be pending in the Status Register (SR)
>> as a result of a previous transfer. However at91_do_twi_transfer() did not
>> read the SR to clear pending interruptions before starting a new transfer.
>> Hence a NACK interrupt rose as soon as it was enabled again at the I2C
>> controller level, resulting in a wrong sequence of operations and strange
>> patterns of behaviour on the I2C bus, such as a clock stretch followed by
>> a restart of the transfer.
>>
>> This first issue occurred with both DMA and PIO write transfers.
>>
>> Also when a NACK error was detected during a PIO write transfer, the
>> interrupt handler used to wrongly start a new transfer by writing into the
>> Transmit Holding Register (THR). Then the I2C slave was likely to reply
>> with a second NACK.
>>
>> This second issue is fixed in atmel_twi_interrupt() by handling the TXRDY
>> status bit only if both the TXCOMP and NACK status bits are cleared.
>>
>> Tested with a at24 eeprom on sama5d36ek board running a linux-4.1-at91
>> kernel image. Adapted to linux-next.
>>
>> Signed-off-by: Cyrille Pitchen 
>> Fixes: 93563a6a71bb ("i2c: at91: fix a race condition when using the DMA 
>> controller")
>> Reported-by: Peter Rosin 
>> Signed-off-by: Ludovic Desroches 
>> Cc: sta...@vger.kernel.org #4.1
> 
> The regression is gone with this patch (vanilla 4.2), thank you!
> 
> However, looking at the bus, there are two NACKs after each
> successful chunk of memory written by the eeprom driver.
> 
> Looking at the eeprom driver, I expect this on the bus:
> 
> S 0x50 0x00 "hello there guys" P
> S 0x50 NACK P
> delay for at least 1 ms (since the eeprom driver has a msleep(1) call).
> S 0x50 NACK P
> delay for at least 1 ms
> ...
> ...
> S 0x50 NACK P
> delay for at least 1 ms
> S 0x50 0x10 "and girls\n" P
> 
> This is not what I observe on the bus, most of the time there is a
> second NACK immediately following the first. I suspect that it is
> the i2c bus driver that somehow confuses itself and reissues the
> command for some reason?
> 
> But this behavior has been there since the beginning, so it's probably
> orthogonal, and killing the data corrupting regression is much more
> important to me than fussing over a surplus failed transfer. Hence
> 
> Tested-by: Peter Rosin 
> 
> Cheers,
> Peter
> 

Hi Peter,

sama5d3x and sama5d4x don't support the so called "Alternative Command mode"
whereas sama5d2x do. The Alternative Command mode comes with a new hardware
mechanism inside the I2C controller which locks the transmission of data on
the I2C bus when a NACK is detected. It means that even if a data is written
into the THR, the I2C controller doesn't push this data on the I2C bus but
retains the data in the THR (and its associated FIFO for sama5d2x and future
SoCs) until the driver unlocks the transmitter by writing the LOCKCLR (Lock
Clear) bit in the Control Register. Then and only then, the transmitter outputs
pending data again.
This new mechanism was introduced to cope with an unwanted DMA write into the
THR after a NACK. Indeed, as I've tried to explain in my patch, when a first
NACK is detected, the I2C controller sets the TXCOMP, NACK and TXRDY bits
alltogether in the Status Register. However setting the TXRDY bit also triggers
the DMA controller to write the next data into the THR. Pitifully, WITHOUT the
new lock mechanism, writing into the THR starts a new I2C frame. Since the
eeprom is likely not to be ready yet, it replies by a second NACK. So you
see on the scope two consecutive NACKs.

On sama5d3x and sama5d4x, which do not support this lock mechanism, you are
likely to see a successful write transfer followed by two NACKs then a delay
and finally a new successful write transfer. This is the case 2b:

1 - A successfull write transfer is completed.
2 - The at24 driver immediately tries to perform the next write transfer...
3 - ... but the eeprom is not ready yet and replies with a first NACK.
4 - The I2C controller detects this NACK and sets the NACK, TXCOMP and TXRDY
bits in its Status Register (and interrupts the CPU).
5 - The DMA controller is triggered by the TXRDY bit to write the next data
into the THR.
6 - WITHTOUT the lock mechanism, the I2C controller starts a new frame because
   of the write into the THR by the DMA controller...
7 - ... but the eeprom is still not ready and replies with a second NACK.

8 - starting from 4, the CPU handles the interrupt generated by the first NACK.
Depending on the timing, it has already masked the interrupted before the
second NACK. In such a case, the NACK bit is pending in the Status
Register. Then it reports the first NACK error to the at24 driver.
9 - The at24 driver handles the NACK error: it sleeps 

Re: [PATCH] i2c: at91: fix write transfers by clearing pending interrupt first

2015-10-13 Thread Peter Rosin
On 2015-10-13 18:47, Cyrille Pitchen wrote:
> Le 13/10/2015 17:19, Peter Rosin a écrit :
>> On 2015-10-13 16:21, Ludovic Desroches wrote:
>>> From: Cyrille Pitchen 
>>>
>>> In some cases a NACK interrupt may be pending in the Status Register (SR)
>>> as a result of a previous transfer. However at91_do_twi_transfer() did not
>>> read the SR to clear pending interruptions before starting a new transfer.
>>> Hence a NACK interrupt rose as soon as it was enabled again at the I2C
>>> controller level, resulting in a wrong sequence of operations and strange
>>> patterns of behaviour on the I2C bus, such as a clock stretch followed by
>>> a restart of the transfer.
>>>
>>> This first issue occurred with both DMA and PIO write transfers.
>>>
>>> Also when a NACK error was detected during a PIO write transfer, the
>>> interrupt handler used to wrongly start a new transfer by writing into the
>>> Transmit Holding Register (THR). Then the I2C slave was likely to reply
>>> with a second NACK.
>>>
>>> This second issue is fixed in atmel_twi_interrupt() by handling the TXRDY
>>> status bit only if both the TXCOMP and NACK status bits are cleared.
>>>
>>> Tested with a at24 eeprom on sama5d36ek board running a linux-4.1-at91
>>> kernel image. Adapted to linux-next.
>>>
>>> Signed-off-by: Cyrille Pitchen 
>>> Fixes: 93563a6a71bb ("i2c: at91: fix a race condition when using the DMA 
>>> controller")
>>> Reported-by: Peter Rosin 
>>> Signed-off-by: Ludovic Desroches 
>>> Cc: sta...@vger.kernel.org #4.1
>>
>> The regression is gone with this patch (vanilla 4.2), thank you!
>>
>> However, looking at the bus, there are two NACKs after each
>> successful chunk of memory written by the eeprom driver.
>>
>> Looking at the eeprom driver, I expect this on the bus:
>>
>> S 0x50 0x00 "hello there guys" P
>> S 0x50 NACK P
>> delay for at least 1 ms (since the eeprom driver has a msleep(1) call).
>> S 0x50 NACK P
>> delay for at least 1 ms
>> ...
>> ...
>> S 0x50 NACK P
>> delay for at least 1 ms
>> S 0x50 0x10 "and girls\n" P
>>
>> This is not what I observe on the bus, most of the time there is a
>> second NACK immediately following the first. I suspect that it is
>> the i2c bus driver that somehow confuses itself and reissues the
>> command for some reason?
>>
>> But this behavior has been there since the beginning, so it's probably
>> orthogonal, and killing the data corrupting regression is much more
>> important to me than fussing over a surplus failed transfer. Hence
>>
>> Tested-by: Peter Rosin 
>>
>> Cheers,
>> Peter
>>
> 
> Hi Peter,
> 
> sama5d3x and sama5d4x don't support the so called "Alternative Command mode"
> whereas sama5d2x do. The Alternative Command mode comes with a new hardware
> mechanism inside the I2C controller which locks the transmission of data on
> the I2C bus when a NACK is detected. It means that even if a data is written
> into the THR, the I2C controller doesn't push this data on the I2C bus but
> retains the data in the THR (and its associated FIFO for sama5d2x and future
> SoCs) until the driver unlocks the transmitter by writing the LOCKCLR (Lock
> Clear) bit in the Control Register. Then and only then, the transmitter 
> outputs
> pending data again.
> This new mechanism was introduced to cope with an unwanted DMA write into the
> THR after a NACK. Indeed, as I've tried to explain in my patch, when a first
> NACK is detected, the I2C controller sets the TXCOMP, NACK and TXRDY bits
> alltogether in the Status Register. However setting the TXRDY bit also 
> triggers
> the DMA controller to write the next data into the THR. Pitifully, WITHOUT the
> new lock mechanism, writing into the THR starts a new I2C frame. Since the
> eeprom is likely not to be ready yet, it replies by a second NACK. So you
> see on the scope two consecutive NACKs.
> 
> On sama5d3x and sama5d4x, which do not support this lock mechanism, you are
> likely to see a successful write transfer followed by two NACKs then a delay
> and finally a new successful write transfer. This is the case 2b:
> 
> 1 - A successfull write transfer is completed.
> 2 - The at24 driver immediately tries to perform the next write transfer...
> 3 - ... but the eeprom is not ready yet and replies with a first NACK.
> 4 - The I2C controller detects this NACK and sets the NACK, TXCOMP and TXRDY
> bits in its Status Register (and interrupts the CPU).
> 5 - The DMA controller is triggered by the TXRDY bit to write the next data
> into the THR.
> 6 - WITHTOUT the lock mechanism, the I2C controller starts a new frame because
>of the write into the THR by the DMA controller...
> 7 - ... but the eeprom is still not ready and replies with a second NACK.
> 
> 8 - starting from 4, the CPU handles the interrupt generated by the first 
> NACK.
> Depending on the timing, it has already masked the interrupted before the
>