Jan Kiszka wrote:
> Jan Kiszka wrote:
>> M. Koehrer wrote:
>>> Hi everybody,
>>>
>>> I have the assumption that one IRQ lock is missing in stackmgr_task():
>>> In my setup I have two realtime NICs in use with rtnet.
>>> And for this setup, I think the assumption in the function stackmgr_task()
>>>
>>> /* we are the only reader => no locking required */
>>>             skb = __rtskb_fifo_remove(&rx.fifo);
>>>
>>> is not valid, as the interrupt routine of the NIC could write to rx.fifo 
>>> via rtnetif_rx() while the stackmgr_task()
>>> is about to read from rx.fifo (eg. triggered by the other NIC's IRQ).
>>> Even with With one NIC it is not save when there many short messages 
>>> arriving very fast.
>> In theory it is safe.
>>
>>> Thus, I think the line above has to be changed to
>>>             skb = rtskb_fifo_remove(&rx.fifo);
>>>
>>> to be really save!
>>>
>>> I actually had a problem with lost packages with two NICs enabled. I hope 
>>> the fix above helps to fix the issue.
>>> At least, the first short test looks promising.
>> Well, not good. That's why using lock-less algorithms is so much fun:
>> it's fairly easy to shot yourself in the knee without even knowing it...
>>
>> I'm going to rethink this carefully again.
> 
> The number of knots in my brain is already increasing. Before this gets
> critical:
> 
> Could you send me your rtnet.o (or .ko) privately? Without the patch
> applied. I need to have a look at the disassembly. BTW, SMP or UP?
> 
> Further question: Did you find any messages in your kernel log after
> loosing packets? Something about dropped packets?
> 

Short update for the readers of this list:

Mathias's problem persists. The reason still remains unclear while at
the same time the lock-less FIFO code actually appears to be correct to
both of us. Mathias found out that once in a while an internal buffer
(rtskb) gets used twice, thus the whole packet reception flow of RTnet
becomes corrupted.

I tried to reproduce his scenario, also setting up two NICs and heavily
loading the box with parallel data streams (ICMP and UDP). Even after
hours of testing no problems showed up here (ok, I found and fixed a
minor device dereferencing issue under overload, but that bug was
unrelated). Given that our software environments are too different (RTAI
3.3-cv over 2.4.32, gcc-2.95 vs. Xenomai 2.2 over 2.6.17.6, gcc-4.1), no
real conclusion can be drawn yet.

We will keep you posted.

Jan

Attachment: signature.asc
Description: OpenPGP digital signature

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
RTnet-users mailing list
RTnet-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rtnet-users

Reply via email to