Hi everybody, after a hard day of debugging we finally detected a bug in rtnet which led to the issue. I have enclosed a patch for rtnet-0.9.3 that fixes the bug and makes the function __rtskb_fifo_remove really interrupt save. Jan, thanks for your great support! Can you please check that patch into subversion?
Regards Mathias > Jan Kiszka wrote: > > Jan Kiszka wrote: > >> M. Koehrer wrote: > >>> Hi everybody, > >>> > >>> I have the assumption that one IRQ lock is missing in stackmgr_task(): > >>> In my setup I have two realtime NICs in use with rtnet. > >>> And for this setup, I think the assumption in the function > stackmgr_task() > >>> > >>> /* we are the only reader => no locking required */ > >>> skb = __rtskb_fifo_remove(&rx.fifo); > >>> > >>> is not valid, as the interrupt routine of the NIC could write to rx.fifo > via rtnetif_rx() while the stackmgr_task() > >>> is about to read from rx.fifo (eg. triggered by the other NIC's IRQ). > >>> Even with With one NIC it is not save when there many short messages > arriving very fast. > >> In theory it is safe. > >> > >>> Thus, I think the line above has to be changed to > >>> skb = rtskb_fifo_remove(&rx.fifo); > >>> > >>> to be really save! > >>> > >>> I actually had a problem with lost packages with two NICs enabled. I > hope the fix above helps to fix the issue. > >>> At least, the first short test looks promising. > >> Well, not good. That's why using lock-less algorithms is so much fun: > >> it's fairly easy to shot yourself in the knee without even knowing it... > >> > >> I'm going to rethink this carefully again. > > > > The number of knots in my brain is already increasing. Before this gets > > critical: > > > > Could you send me your rtnet.o (or .ko) privately? Without the patch > > applied. I need to have a look at the disassembly. BTW, SMP or UP? > > > > Further question: Did you find any messages in your kernel log after > > loosing packets? Something about dropped packets? > > > > Short update for the readers of this list: > > Mathias's problem persists. The reason still remains unclear while at > the same time the lock-less FIFO code actually appears to be correct to > both of us. Mathias found out that once in a while an internal buffer > (rtskb) gets used twice, thus the whole packet reception flow of RTnet > becomes corrupted. > > I tried to reproduce his scenario, also setting up two NICs and heavily > loading the box with parallel data streams (ICMP and UDP). Even after > hours of testing no problems showed up here (ok, I found and fixed a > minor device dereferencing issue under overload, but that bug was > unrelated). Given that our software environments are too different (RTAI > 3.3-cv over 2.4.32, gcc-2.95 vs. Xenomai 2.2 over 2.6.17.6, gcc-4.1), no > real conclusion can be drawn yet. > > We will keep you posted. > > Jan > > > > -------------------------------- > > ------------------------------------------------------------------------- > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get the chance to share > your > opinions on IT & business topics through brief surveys -- and earn cash > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV > > -------------------------------- > > _______________________________________________ > RTnet-users mailing list > RTnet-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rtnet-users > -- Mathias Koehrer [EMAIL PROTECTED] Viel oder wenig? Schnell oder langsam? Unbegrenzt surfen + telefonieren ohne Zeit- und Volumenbegrenzung? DAS TOP ANGEBOT JETZT bei Arcor: günstig und schnell mit DSL - das All-Inclusive-Paket für clevere Doppel-Sparer, nur 44,85 inkl. DSL- und ISDN-Grundgebühr! http://www.arcor.de/rd/emf-dsl-2
rtskb_fifo.patch
Description: Binary data
------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys -- and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________ RTnet-users mailing list RTnet-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rtnet-users