Wolfgang Grandegger wrote: > Sebastian Smolorz wrote: >> Sebastian Smolorz wrote: >>> Hi Jan, >>> >>> Jan Kiszka wrote: >>>> Wolfgang Grandegger wrote: >>>>> you know, on the SJA1000 the bus error interrupt can result in high >>>>> error interrupt rates and even hang the system on slow processors. >>>>> Just >>>>> unplugging the CAN cable can cause such interrupt flooding. This >>>>> problem >>>>> >>>>> popped up again recently and Sebastian proposed: >>>>>> Last summer we had a discussion about the BEI issue on the >>>>>> socketcan-ML. Two additional handling policies popped up: >>>>>> 1. The interface could restart itself after an amount of BEIs, thus >>>>>> taking responsibility from the user application. >>>>>> 2. The BEI could be completely disabled if no one is interested in >>>>>> this ype of error frame. >>>>> As 2. is also my preferred solution, I have implemented it. The only >>>>> downside is that you do not see the error counter increasing when >>>>> /proc/rtcan/devices is inspected. We also discussed 1., but >>>>> RT-Socket-CAN does not restart the CAN controller by purpose and just >>>>> stoppping it requires user intervention. >>>> And if there is someone listening, how is the flooding issue on cable >>>> unplug etc. solved by option 2? >>> Hm, maybe we could implement 1 additionally (but without automatical >>> restart)? >> >> A more precise suggestion: What about letting BEIs appear until >> passive mode is reached and if the TX error counter doesn't count up >> any more (indication of start-up situation discovered by the SJA1000) >> the driver ceases to read out ECC any further (thanks Stephane for the >> hint). The controller would be still operating but not reporting BEIs >> any more. There has to be some mechanism to let BEIs through after the >> situation has normalized. Maybe the driver could check inside the >> interrupt handler if active mode was reached again after the above >> situation occured. > > Well, this is rather sophisticated and needs some more careful > evaluation. We might also reach the passive level slowly without > flooding. Furthermore, the method should also be applicable for other > controllers.
What is the current behaviour of other controllers? > > Let's implement 1. and downscaled printk and wait for the users reaction > , see also my other mail. Then we should bring up this discussion again > on the Socket-CAN-ML to negotiate a common solution. Instead of waiting on some user triggering a (potential) latency mine, I would prefer that we experimentally evaluate the effect. E.g. via an I-pipe tracer dump on a faster and a slower box. I would offer to run some demo code here on our PC104 Phytec boards as well. The problem is to define what degree of error-related IRQ load is generally acceptable. We surely can't do this, so we have to document the effect /at least/ and help the users to check it on their own - or we have to avoid it / make it insignificant compared to normal CAN operation (I'm still in favour of this path). Jan
signature.asc
Description: OpenPGP digital signature
_______________________________________________ Xenomai-help mailing list [email protected] https://mail.gna.org/listinfo/xenomai-help
