Re: [Xenomai-help] RT-Socket-CAN bus error handling (was CAN errors and real-time behaviour (IRQ raise forever and may lock system))

Wolfgang Grandegger Mon, 19 Mar 2007 00:56:59 -0800

Sebastian Smolorz wrote:

Hi Jan,


Jan Kiszka wrote:

Wolfgang Grandegger wrote:

you know, on the SJA1000 the bus error interrupt can result in high
error interrupt rates and even hang the system on slow processors. Just
unplugging the CAN cable can cause such interrupt flooding. This problem

popped up again recently and Sebastian proposed:

Last summer we had a discussion about the BEI issue on the
socketcan-ML. Two additional handling policies popped up:
1. The interface could restart itself after an amount of BEIs, thus
   taking responsibility from the user application.
2. The BEI could be completely disabled if no one is interested in
   this ype of error frame.

As 2. is also my preferred solution, I have implemented it. The only
downside is that you do not see the error counter increasing when
/proc/rtcan/devices is inspected. We also discussed 1., but
RT-Socket-CAN does not restart the CAN controller by purpose and just
stoppping it requires user intervention.

And if there is someone listening, how is the flooding issue on cable
unplug etc. solved by option 2?


Hm, maybe we could implement 1 additionally (but without automatical restart)?

What about something like option 3: After the first error occurred that
may mark the beginning of a flood, disable that error interrupt until
the next stop/start cycle or the user has read the event?

IIRC, there is no possibility to detect a "normal" bus error (acknowledge)appearing during normal operation from the one occuring when the cable isplugged off. The best indication is a high number of consecutive BEIs.

I agree. But the controller internally counts the errors as wellreflected by the change of the state to warning or passive. If theapplication is interested in more details, it could listen on errormessages.


Let's summarize the situation with 2. (on request bus errors) available:

- Bus error interrupts are suppressed unless an application really
  request them.

- If an application listens on error messages, a high interrupt rate
  could cause the socket buffer to overflow resulting in lost messages.
  As far as I have seen, this is not yet a real problem but it gets
  worse when debugging is configured and printk messages are generated:

  /* Overflow of socket's ring buffer! */
  sock->rx_buf_full++;
  RTCAN_RTDM_DBG("%s: socket buffer overflow (fd=%d), message "
                 "discarded\n",
                 rtcan_proto_raw_dev.driver_name, context->fd);

  This can indeed hang the system and I tend just to downscale the
  frequency of the log output by, let's say a factor of 10 or 20 and
  adding to the log:

  "Not all overflows are listed. Please inspect /proc/rtcan/sockets!"

Concerning 1. (stopping the device after n bus errors): I think thisconflicts somehow with 2. because the application explicitly wants toreceive them. If it realizes a high rate, it could react appropriately.

For the moment I think 2. and downscaled printk's are already be a bigimprovement and should make most users happy. Let's wait for some realworld application requiring solution 1.


Wolfgang.

_______________________________________________
Xenomai-help mailing list
[email protected]
https://mail.gna.org/listinfo/xenomai-help

Re: [Xenomai-help] RT-Socket-CAN bus error handling (was CAN errors and real-time behaviour (IRQ raise forever and may lock system))

Reply via email to