Re: SJA1000 SMP issue with command register

Klaus Hitschler Wed, 12 May 2010 23:23:17 -0700

Hi Wolfgang,

my comments are in between your text:

Am Dienstag, 11. Mai 2010, um 21:42:31 schrieben Sie:
> On 05/11/2010 09:31 PM, Oliver Hartkopp wrote:
> > On 11.05.2010 20:53, Wolfgang Grandegger wrote:
> >> Hi Oliver,
> >> 
> >> On 05/10/2010 07:09 PM, Oliver Hartkopp wrote:
> >>> i wonder whether it is enough just to settle the register write of the
> >>> command register by adding the ndelay().
> >> 
> >> We need some protection, of course. Also, ndelay() is not available on
> >> all archs and might be mapped to udelay(1). In the patch you posted, an
> >> extra read is done for that purpose, I assume.
> > 
> > Yes. That was surely Klaus' intention - i just copied this little sniplet
> > ;-)
> > 
> >>> But IMO additionally the tx path should probably generally take the
> >>> hands of the chip, when the used hard-irq for rx operations is active.
> >>> 
> >>> I would suggest sja1000_interrupt() and sja1000_start_xmit() not to run
> >>> together. What about some _bh-locking in sja1000_start_xmit() ???
> >> 
> >> Yes, fine, especially if it does solve the issue with the command
> >> register as well?
> > 
> > As Kurt already pointed out the _bh locking is probably not enough here -
> > i think it's only used for blocking soft-irqs ...
> 
> Yes.
> 
> > Will take a closer look ...
> 
> For the time being, I tend to fix just the problem for the system where
> is shows up. It would be nice if we could reproduce it somehow. Klaus,
> on what hardware did you realize that problem?

On multiple x86 multi-core machines. The scenario was always the same. The 
users got lots of receive data and did send data with a low frequency. In this 
case sometimes the initial write (triggered in the direct control path of a 
ioctl) got lost and following writes become stalled since no transmit-ready 
interrupt was raised. I managed to reproduce the fault multiple times.

The write stall did not happen on single core machines.

It is not enough to add a delay since you cannot determine when the register 
is accessed by another core in follow of a interrupt. Even Softirq code can be 
interrupted by Hardirqs. 

OK. Most PCI systems cannot run the desgined 33 nsec BUS cycles. But the 
situation is not less serious with e.g. 66 nsec, 99 nsec ...

Regards,

Klaus

> 
> Wolfgang.
_______________________________________________
Socketcan-core mailing list
[email protected]
https://lists.berlios.de/mailman/listinfo/socketcan-core

Re: SJA1000 SMP issue with command register

Reply via email to