2012/4/8 Soren Kristensen <[email protected]>: > Hi Attila, > > Attila Kinali wrote: >> On Sun, 08 Apr 2012 01:34:58 +0200 >> Soren Kristensen<[email protected]> wrote: >> >>> Alvar Kusma wrote: >>>> >>>>> As I have stated before, afaik the net5501 do not have any design >>>>> issues, Attila's problem is most likely either software related, >>>> >>>> Please, can you explain, why similar board from PCEngines (Alix 2D13) >>>> with same software (OpenWRT image) just works, but Soekris board shows >>>> some unstability? Can you explain, why this same exact software works on >>>> one net5501 without a glitch over year now, but two other units show >>>> unstability signs - random hangs, sometimes works over month, sometimes >>>> crashes 2 times a day? This is still a mystery for me. Just bad luck? >>> >>> No, pretty simple: >>> >>> The Linux VT6105M driver has interrupt race problems, reported to have >>> been fixed recently, don't know if it have ported to the main Linux sources. >>> >>> The Atheros wlan drivers seems to also have interrupt race problems, >>> don't remember if that have been fixed too. >> >> You repeat this argument over and over. But apearently, you are the only >> one who knows about these race conditions. I cannot find any reference >> to the race condition on the VT6105M at all. And for the ath9k race, >> the only one i could find was fixed october 2010 in the mainline kernel. >> Can you provide us with references to what race conditions you mean and >> where they are to be found? > > From my post on 12/7/2011: > > Looking though the archieves I found two reported issues, both on Linux. > > 1) The thread in Sept/Oct 2010, concluding with Andrey Safonov reporting > the Linux VIA VT6105M driver to have bug, and how to fix it: > > http://lists.soekris.com/pipermail/soekris-tech/2010-October/016884.html > http://lists.soekris.com/pipermail/soekris-tech/2010-October/016889.html
Hey, I tried submitting a patch containing those two lines upstream, resulting in some work from Francois Romieu that fixes it the right way. (See the mail I sent to this list on January 22. requesting help testing, what nobody replied to). Those patches where merged into mainline in linux 3.3-rc1, so version 3.3 and forward contains those fixes, which help fix the interrupt crashes. So for everyone using a kernel below version 3.3 and complaining about crashes, they really should have read their mail and tested with something newer - they had been notified :) /Bjarke > 2) And "green" reporting a fix to either ath9k, or all wireless drivers, > in his post on Jan 25, 2011: > > http://lists.soekris.com/pipermail/soekris-tech/2011-January/017001.html > >> >> But to kill that driver bug argument once and for all please explain the >> following which i've seen during my test: >> >> Setup: >> net5501 running debian/stable with a self build vanilla linux kernel >> version 3.2.1. Connected to the net5501 are a notebook sata harddisk >> and AR9200 wlan card. The LAN is connected on eth0. >> >> If the WLAN card is _not_ running (driver not loaded or disbaled by rfkill) >> no problems can be seen. No crashes, nothing. For months. >> >> Test #1: >> Setup as above. WLAN card enabled, traffic going trough both WLAN and eth0. >> Result: System crashes in 2minutes (+/- 1 minute). No Oops, as would be >> seen with most driver bugs on the serial console. It just hangs. >> >> Test #2: >> Setup and test procedure as in Test #1, but with two 1000uF capacitors >> connected to J5 at 5V and 3.3V power supplies. >> Result: System crashes in 5minutes (-1min, +2min). >> >> Test #3: >> Setup nd test procedure as in Test #2, but with three dozen ceramic >> capacitors >> soldered on the board. >> Result: No crash at all after one week. Even heavy system load doesn't >> affect the system anymore. >> >> >> Notes: >> 1) Test #1 and Test #2 were repeated several dozen times. Although i have not >> writen down the times it takes to crash the system and didnt do a >> mathematically rigourus statistical analysis, i can state that the >> difference between Test #1 and #2 is significant. Ie the additional >> capacitors >> improve the situation considerably. >> >> 2) I run Test #1 and #2 before i did the modifications for Test #3 to ensure >> the bug is still present and can be reproduced. I did not do any software >> upgrades or any configuration changes in between. Ie if it would be a >> software >> bug, it would be present in all three tests. >> >> >> Soren, if you really have an explenation how a software bug (a race >> condition as you say) can be fixed with a soldering iron, i really like >> to hear that. I have systems that experience race conditions under >> every once in a while and i'd like to fix those as well with my soldering >> iron. > > Attila, thanks for the detailed testing done. I agreed with you that > adding capacitors should not change behavior if it's a software problem > alone. > > I will still state that the net5501 has the decoupling it needs for > itself and the expansions it's designed for. One possible sources of > problem could be the power supply regulators as they located just behind > the mini-PCI slot, RF could be affecting t.ex. the compensation circuit, > so adding decoupling capacitors just fix the symptoms. > > I would also like to investigate the problem further. Can you please > tell me the exact wlan card ? > > And can you please ensure that the vt6105 driver is updated to a fixed > one, would really love data after that is done.... > > I still have the problem that nobody running FreeBSD and OpenBSD have > reported similar issues, somebody correct me if I'm wrong. > > > Best Regards, > > > Soren Kristensen > > CEO & Chief Engineer > Soekris Engineering, Inc. > _______________________________________________ > Soekris-tech mailing list > [email protected] > http://lists.soekris.com/mailman/listinfo/soekris-tech _______________________________________________ Soekris-tech mailing list [email protected] http://lists.soekris.com/mailman/listinfo/soekris-tech
