Romain Lenglet wrote: >>Please let me know if you experience any problems, so I can >>update the driver... > > > Ok, thank you. I *do* experience problems, hence here is a report > as complete as possible. > > > I have tried to use one of my 8169 interfaces as a capture > interface on a 100Mbps network, for testing, with rtnet capture. > Only one of my 8169 interfaces is activated in rtnet.conf > (RT_DRIVER_OPTIONS="cards=1,0,0,0"). > > > After starting rtnet capture, the console periodically outputs a > lot of times (15 to 30 times) the following line: > r8169: rtl8169_rx_interrupt: Allocate n_skb failed! > (priv->rx_buf_size = 1536) > ... > followed by: > r8169: Leaving Interrupt Unhandled >
Klaus, any ideas? The IRQ handler seems to run out of buffers, but this should not happen with normally loaded networks and a rather unloaded box. > > When capturing packets (e.g. starting tethereal -i rteth0), it > captures a few packets, then nothing for 30 seconds, then > capture a few packets again, then nothing for 30 seconds, etc. > (although there are constantly many broadcast packets sent on > this network) > > I also see sometimes the line: > rteth0: Too much work at interrupt! > > > In addition, the RT driver seems to block the interrupts of > another ethernet interface, that is non-realtime and that is > managed by the standard Linux 3Com 3c905 driver. I get > periodically messages on the console from that driver, such as: > NETDEV WATCHDOG: eth2: transmit timed out > eth2: transmit timed out, tx_status 00 status e681. > diagnostics: net 0ccc media 8880 dma 0000003a fifo 8000 > eth2: Interrupt posted but not delivered -- IRQ blocked by > another device? > flags: bus-master 1, dirty 5419(11) current 5419(11) > Transmit list 00000000 vs. f719f8e0 > 0: @f719f200 length 8000002a status 0001002a > 1: @f719f2a0 length 8000002a status 0001002a > (etc. 16 lines like those) > > (in fact, I have 3 interfaces on that PC: eth2 is a 3Com 3C905, > and rteth0 and rteth1 are 8169-based interfaces) Oops, try to resolve this first. It is in NO WAY recommended to share IRQs between real-time and non-real-time devices. You may even face crashes as we recently did here with an older RTAI version. So far, I'm not sure if the crash is due to RTAI or a bug in RTnet, but as IRQ sharing makes no sense from the real-time point of view anyway, we did not yet look at this effect closer. > > And effectively, the traffic on the non-RT interface is stopped: > if I had opened a ssh connection, it becomes no more responsive > once I have started rtnet and rtcap. > If I stop rtnet, i.e. if I put the RT interface down (no need to > unload the RT driver module), it depends on the state of RTAI: > > - if RTAI becomes completely broken (i.e. /proc/rtai/sched says: > "Cannot allocate memory"), then unloading rtnet is not > sufficient: the non-RT network card interrupts are still > blocked, and I need to rmmod and insmod the rt_rtdm module again > to unblock the non-RT interface, and recover a > normal /proc/rtai/sched output. > > - if RTAI is still not completely broken, it is sufficient to > stop rtnet. > > > > Just after starting rtnet capture, here is the content of > my /proc/rtai/sched: > CPU PID PRI TIMEOUT STAT NAME > 0 0 0 0 R ROOT > 0 0 98 0 W f8b5b9c0 > 0 0 1 0 W f8b5c120 > > But after a while, the content of /proc/rtai/sched becomes: > /proc/rtai/sched: Cannot allocate memory > > (the only way to solve this is to rmmod rtai_rtdm module, and > insmod again) > > > My RT interface and my non-RT interface share the same IRQ 16. > Cf the content of /proc/rtai/irq: > IRQ CPU0 > 16: 56 > 216: 1 > and the content of /proc/interrupts: > CPU0 > 0: 2751710 IO-APIC-edge timer, rthal_broadcast_timer > 1: 3784 IO-APIC-edge i8042 > 7: 0 IO-APIC-edge parport0 > 8: 1 IO-APIC-edge rtc > 9: 1 IO-APIC-level acpi > 14: 6025 IO-APIC-edge ide0 > 15: 26 IO-APIC-edge ide1 > 16: 18795 IO-APIC-level eth2 > 17: 0 IO-APIC-level uhci_hcd:usb1 > 18: 0 IO-APIC-level Intel ICH2 > 19: 0 IO-APIC-level uhci_hcd:usb2 > NMI: 0 > LOC: 2751621 > ERR: 0 > MIS: 0 > > > > So, as a conclusion: the good news is that the driver is somewhat > working (I can capture some packets with it...), but 1) > interrupt handling seems to be broken, and 2) there seems to be > a memory leak in RTAI/RTDM or in RTnet. > Probably those two bugs are related. > Resolve the IRQ conflict first and then check if the problems persist. If yes, please report again. Jan
signature.asc
Description: OpenPGP digital signature