Hi Attila,

Attila Kinali wrote:
> On Sun, 08 Apr 2012 01:34:58 +0200
> Soren Kristensen<[email protected]>  wrote:
>
>> Alvar Kusma wrote:
>>>
>>>> As I have stated before, afaik the net5501 do not have any design
>>>> issues, Attila's problem is most likely either software related,
>>>
>>> Please, can you explain, why similar board from PCEngines (Alix 2D13)
>>> with same software (OpenWRT image) just works, but Soekris board shows
>>> some unstability? Can you explain, why this same exact software works on
>>> one net5501 without a glitch over year now, but two other units show
>>> unstability signs - random hangs, sometimes works over month, sometimes
>>> crashes 2 times a day? This is still a mystery for me. Just bad luck?
>>
>> No, pretty simple:
>>
>> The Linux VT6105M driver has interrupt race problems, reported to have
>> been fixed recently, don't know if it have ported to the main Linux sources.
>>
>> The Atheros wlan drivers seems to also have interrupt race problems,
>> don't remember if that have been fixed too.
>
> You repeat this argument over and over. But apearently, you are the only
> one who knows about these race conditions. I cannot find any reference
> to the race condition on the VT6105M at all. And for the ath9k race,
> the only one i could find was fixed october 2010 in the mainline kernel.
> Can you provide us with references to what race conditions you mean and
> where they are to be found?

 From my post on 12/7/2011:

Looking though the archieves I found two reported issues, both on Linux.

1) The thread in Sept/Oct 2010, concluding with Andrey Safonov reporting 
the Linux VIA VT6105M driver to have bug, and how to fix it:

http://lists.soekris.com/pipermail/soekris-tech/2010-October/016884.html
http://lists.soekris.com/pipermail/soekris-tech/2010-October/016889.html

2) And "green" reporting a fix to either ath9k, or all wireless drivers, 
in his post on Jan 25, 2011:

http://lists.soekris.com/pipermail/soekris-tech/2011-January/017001.html

>
> But to kill that driver bug argument once and for all please explain the
> following which i've seen during my test:
>
> Setup:
> net5501 running debian/stable with a self build vanilla linux kernel
> version 3.2.1. Connected to the net5501 are a notebook sata harddisk
> and AR9200 wlan card. The LAN is connected on eth0.
>
> If the WLAN card is _not_ running (driver not loaded or disbaled by rfkill)
> no problems can be seen. No crashes, nothing. For months.
>
> Test #1:
> Setup as above. WLAN card enabled, traffic going trough both WLAN and eth0.
> Result: System crashes in 2minutes (+/- 1 minute). No Oops, as would be
> seen with most driver bugs on the serial console. It just hangs.
>
> Test #2:
> Setup and test procedure as in Test #1, but with two 1000uF capacitors
> connected to J5 at 5V and 3.3V power supplies.
> Result: System crashes in 5minutes (-1min, +2min).
>
> Test #3:
> Setup nd test procedure as in Test #2, but with three dozen ceramic capacitors
> soldered on the board.
> Result: No crash at all after one week. Even heavy system load doesn't
> affect the system anymore.
>
>
> Notes:
> 1) Test #1 and Test #2 were repeated several dozen times. Although i have not
> writen down the times it takes to crash the system and didnt do a
> mathematically rigourus statistical analysis, i can state that the
> difference between Test #1 and #2 is significant. Ie the additional capacitors
> improve the situation considerably.
>
> 2) I run Test #1 and #2 before i did the modifications for Test #3 to ensure
> the bug is still present and can be reproduced. I did not do any software
> upgrades or any configuration changes in between. Ie if it would be a software
> bug, it would be present in all three tests.
>
>
> Soren, if you really have an explenation how a software bug (a race
> condition as you say) can be fixed with a soldering iron, i really like
> to hear that. I have systems that experience race conditions under
> every once in a while and i'd like to fix those as well with my soldering
> iron.

Attila, thanks for the detailed testing done. I agreed with you that 
adding capacitors should not change behavior if it's a software problem 
alone.

I will still state that the net5501 has the decoupling it needs for 
itself and the expansions it's designed for. One possible sources of 
problem could be the power supply regulators as they located just behind 
the mini-PCI slot, RF could be affecting t.ex. the compensation circuit, 
so adding decoupling capacitors just fix the symptoms.

I would also like to investigate the problem further. Can you please 
tell me the exact wlan card ?

And can you please ensure that the vt6105 driver is updated to a fixed 
one, would really love data after that is done....

I still have the problem that nobody running FreeBSD and OpenBSD have 
reported similar issues, somebody correct me if I'm wrong.


Best Regards,


Soren Kristensen

CEO & Chief Engineer
Soekris Engineering, Inc.
_______________________________________________
Soekris-tech mailing list
[email protected]
http://lists.soekris.com/mailman/listinfo/soekris-tech

Reply via email to