Jan Kiszka wrote: > Jan Kiszka wrote: >> M. Koehrer wrote: >>> Hi everybody, >>> >>> I have the assumption that one IRQ lock is missing in stackmgr_task(): >>> In my setup I have two realtime NICs in use with rtnet. >>> And for this setup, I think the assumption in the function stackmgr_task() >>> >>> /* we are the only reader => no locking required */ >>> skb = __rtskb_fifo_remove(&rx.fifo); >>> >>> is not valid, as the interrupt routine of the NIC could write to rx.fifo >>> via rtnetif_rx() while the stackmgr_task() >>> is about to read from rx.fifo (eg. triggered by the other NIC's IRQ). >>> Even with With one NIC it is not save when there many short messages >>> arriving very fast. >> In theory it is safe. >> >>> Thus, I think the line above has to be changed to >>> skb = rtskb_fifo_remove(&rx.fifo); >>> >>> to be really save! >>> >>> I actually had a problem with lost packages with two NICs enabled. I hope >>> the fix above helps to fix the issue. >>> At least, the first short test looks promising. >> Well, not good. That's why using lock-less algorithms is so much fun: >> it's fairly easy to shot yourself in the knee without even knowing it... >> >> I'm going to rethink this carefully again. > > The number of knots in my brain is already increasing. Before this gets > critical: > > Could you send me your rtnet.o (or .ko) privately? Without the patch > applied. I need to have a look at the disassembly. BTW, SMP or UP? > > Further question: Did you find any messages in your kernel log after > loosing packets? Something about dropped packets? >
Short update for the readers of this list: Mathias's problem persists. The reason still remains unclear while at the same time the lock-less FIFO code actually appears to be correct to both of us. Mathias found out that once in a while an internal buffer (rtskb) gets used twice, thus the whole packet reception flow of RTnet becomes corrupted. I tried to reproduce his scenario, also setting up two NICs and heavily loading the box with parallel data streams (ICMP and UDP). Even after hours of testing no problems showed up here (ok, I found and fixed a minor device dereferencing issue under overload, but that bug was unrelated). Given that our software environments are too different (RTAI 3.3-cv over 2.4.32, gcc-2.95 vs. Xenomai 2.2 over 2.6.17.6, gcc-4.1), no real conclusion can be drawn yet. We will keep you posted. Jan
signature.asc
Description: OpenPGP digital signature
------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys -- and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________ RTnet-users mailing list RTnet-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rtnet-users