Re: [Soekris] net5501 inexplicable crashes under load and using wlan - mystery solved
On Sat, 07 Apr 2012 09:27:53 -0700 JSL Internet wrote: > Thank you for the detailed post. It is clear that you have some > understanding of best practices regarding digital PCB layout. However, > you've made a number of assumptions and generalizations. Are you > certain that there are no power and ground layers within the net5501 > multi-layer board? I would be surprised if this were true, but I'll > leave it up to Soren to clarify. Uhm? I've never said that there are no power planes. To the contrary i i'm pretty sure the net5501 has power planes. If you look at the board, sepecially at the layer beneath the surface, you dont see any tracks, actually you dont see any structure. This is a 99% sure sign that you are looking at a plane. Of course, something could block the view down to the next layer and give the impression of a plane, but that's very very unlikely > The photo you posted shows four bypass > capacitors surrounding the RAM chip in very close proximity. It would > be difficult for any layout designer to get them any closer to the > chip. Sharing capacitors is just fine as long as there are separate > runs from the capacitor to each chip. The whole point of the capacitors > is to offset the inductance of the PCB power runs. It is physically > impossible to have zero-length runs from the capacitors to the chips. > The chips have small internal capacitors anyway. Your photo > demonstrates a good PCB design. I did not say that the capacitors are far away. They are near enough. But i critisized that there are not enough of them. There are several power supply pins per chip for a reason. And you are advised to put a blocking capacitor at each of them. And the board would have had enough space for more capacitors. Hence, i say that the board shows clearly _bad_ PCB design. > If I were having problems with a small SBC (net5501) that only > occurred when it was attached to an RF transmitter (your WLAN card), I > would be looking at the RF susceptibility of the SBC, and the isolation > of the transmission line and transmitting elements from the SBC. It's > doubtful that Soren did any RF susceptibility testing or analysis of the > computer. Most computer manufacturers do not bother with such things. > The metal boxes he sells for the SBCs should take care of most of the > problems anyway. It's entirely likely that the problems that you've > "fixed" are related to the RF susceptibility of the net5501. Small > pieces of carefully applied brass or mu-metal in selected circuit areas > would probably have accomplished the same thing that you did by adding > some capacitors. No, RF suceptibility looks different. Beside, if the board would be suceptible for RF emissions from a WLAN card with which it is sold on the soekris website, i would say that something is very wrong. But lets focus on technical arguments: As you said, you are an experienced RF and digital engineer, then you know for sure that wires that run unprotected by a nearby ground plane and/or run in large loops are susceptible for EMI. I cannot see any such wires/tracks on the board at all. All run straight and quite short. Beside, the signals are all digital which means that the field strength to cause problems would have to be large. Larger than a hermetically shielded RF module can produce. And what really destroys that argument is, that soldering capacitors at the power supply pins fixes the crashes. Power supply is something very low impedance, to even get a microvolt of induced voltage into such a wire (which runs a couple of mm from the plane up and to the pin), you need a multiple watts of directed signal. That's something you dont have in such a system. Attila Kinali -- Why does it take years to find the answers to the questions one should have asked long ago? ___ Soekris-tech mailing list Soekris-tech@lists.soekris.com http://lists.soekris.com/mailman/listinfo/soekris-tech
Re: [Soekris] net5501 inexplicable crashes under load and using wlan - mystery solved
2012/4/11 Svenning Sørensen : > > > On 10-04-2012 15:12, Jan Ceuleers wrote: >> Svenning Sørensen wrote: >>> Two machines sending high rates of UDP packets to each other would make >>> the box hang rather fast, often within a few seconds under certain >>> loads. >>> I made a few fixes to the 3.2.13 driver which seem to have solved the >>> hangs as well as another major problem (the one addressed in 3.3), >>> namely missed timer ticks due to too much work in the via-rhine >>> interrupt handler. >>> It has now been running for a week under high load without hanging at >>> all. >> >> Svenning, >> >> Would you consider submitting this to netdev once you believe the >> problem is solved? >> >> Thanks, Jan > > Actually it seems they've already spotted it on the netdev list :) I would really have wished people had picked it up back in january and helped catch some of the bugs there, so we won't have to wait for 3.4 before it becomes really usable :o) If people could try and give v3.4-rc2 with this patch applied a run, it would be really helpful: http://patchwork.ozlabs.org/patch/151700/ So far it seems stable to me, but my setup most likely won't catch every problem. /Bjarke > Svenning > > > ___ > Soekris-tech mailing list > Soekris-tech@lists.soekris.com > http://lists.soekris.com/mailman/listinfo/soekris-tech ___ Soekris-tech mailing list Soekris-tech@lists.soekris.com http://lists.soekris.com/mailman/listinfo/soekris-tech
Re: [Soekris] net5501 inexplicable crashes under load and using wlan - mystery solved
On 10-04-2012 15:12, Jan Ceuleers wrote: > Svenning Sørensen wrote: >> Two machines sending high rates of UDP packets to each other would make >> the box hang rather fast, often within a few seconds under certain >> loads. >> I made a few fixes to the 3.2.13 driver which seem to have solved the >> hangs as well as another major problem (the one addressed in 3.3), >> namely missed timer ticks due to too much work in the via-rhine >> interrupt handler. >> It has now been running for a week under high load without hanging at >> all. > > Svenning, > > Would you consider submitting this to netdev once you believe the > problem is solved? > > Thanks, Jan Actually it seems they've already spotted it on the netdev list :) Svenning ___ Soekris-tech mailing list Soekris-tech@lists.soekris.com http://lists.soekris.com/mailman/listinfo/soekris-tech
Re: [Soekris] net5501 inexplicable crashes under load and using wlan - mystery solved
Svenning Sørensen wrote: > Two machines sending high rates of UDP packets to each other would make > the box hang rather fast, often within a few seconds under certain loads. > I made a few fixes to the 3.2.13 driver which seem to have solved the > hangs as well as another major problem (the one addressed in 3.3), > namely missed timer ticks due to too much work in the via-rhine > interrupt handler. > It has now been running for a week under high load without hanging at all. Svenning, Would you consider submitting this to netdev once you believe the problem is solved? Thanks, Jan ___ Soekris-tech mailing list Soekris-tech@lists.soekris.com http://lists.soekris.com/mailman/listinfo/soekris-tech
Re: [Soekris] net5501 inexplicable crashes under load and using wlan - mystery solved
On 08-04-2012 22:54, Soren Kristensen wrote: And it is a fact that the Linux VT6105M driver had bugs, reported to be fixed in Linux 3.3. And that the atheros and/or wlan drivers had/have bugs, don't know if fixed. Opgrade to Linux>= 3.3 and we can continue talking. I have observed the via-rhine bugs as well - not only on net5501, but also on alix boards. While some of these bugs have been fixed in Linux 3.3, the most critical one (random lockup) is still there (and new bugs have sneaked in, such as random loss of link). Two machines sending high rates of UDP packets to each other would make the box hang rather fast, often within a few seconds under certain loads. I made a few fixes to the 3.2.13 driver which seem to have solved the hangs as well as another major problem (the one addressed in 3.3), namely missed timer ticks due to too much work in the via-rhine interrupt handler. It has now been running for a week under high load without hanging at all. I've attached a patch, but I don't know if the list server will allow it.. An *important* thing to be aware of is that it doesn't work reliably unless MMIO mode (CONFIG_VIA_RHINE_MMIO) is disabled! I don't know if that is a driver issue or a hardware issue (could be the VT6105M chip itself or the PCI bridge inside the Geode causing trouble with posted writes or write combining or some such), but in any case I don't think anyone could blame Søren for this. Svenning --- via-rhine.c.orig2012-03-23 21:54:45.0 +0100 +++ via-rhine.c 2012-04-10 12:22:15.0 +0200 @@ -76,7 +76,8 @@ There are no ill effects from too-large receive rings. */ #define TX_RING_SIZE 16 #define TX_QUEUE_LEN 10 /* Limit ring entries actually used. */ -#define RX_RING_SIZE 64 +#define RX_RING_SIZE 16 +#define RX_NAPI_WEIGHT 8 /* Operational parameters that usually are not changed. */ @@ -781,6 +782,7 @@ pioaddr = pci_resource_start(pdev, 0); memaddr = pci_resource_start(pdev, 1); + pci_set_mwi(pdev); pci_set_master(pdev); dev = alloc_etherdev(sizeof(struct rhine_private)); @@ -868,7 +870,7 @@ dev->ethtool_ops = &netdev_ethtool_ops, dev->watchdog_timeo = TX_TIMEOUT; - netif_napi_add(dev, &rp->napi, rhine_napipoll, 64); + netif_napi_add(dev, &rp->napi, rhine_napipoll, RX_NAPI_WEIGHT); if (rp->quirks & rqRhineI) dev->features |= NETIF_F_SG|NETIF_F_HW_CSUM; @@ -1019,6 +1021,7 @@ PCI_DMA_FROMDEVICE); rp->rx_ring[i].addr = cpu_to_le32(rp->rx_skbuff_dma[i]); + wmb(); rp->rx_ring[i].rx_status = cpu_to_le32(DescOwn); } rp->dirty_rx = (unsigned int)(i - RX_RING_SIZE); @@ -1475,6 +1478,7 @@ void __iomem *ioaddr = rp->base; unsigned entry; unsigned long flags; + int txstatus; /* Caution: the write order is important here, set the field with the "ownership" bits last. */ @@ -1485,6 +1489,14 @@ if (skb_padto(skb, ETH_ZLEN)) return NETDEV_TX_OK; + IOSYNC; + txstatus = le32_to_cpu(rp->tx_ring[entry].tx_status); + rmb(); + if (unlikely(txstatus & DescOwn)) { + netdev_warn(dev, "Tx descriptor busy\n"); + return NETDEV_TX_BUSY; + } + rp->tx_skbuff[entry] = skb; if ((rp->quirks & rqRhineI) && @@ -1518,17 +1530,17 @@ cpu_to_le32(TXDESC | (skb->len >= ETH_ZLEN ? skb->len : ETH_ZLEN)); if (unlikely(vlan_tx_tag_present(skb))) { - rp->tx_ring[entry].tx_status = cpu_to_le32((vlan_tx_tag_get(skb)) << 16); + txstatus = (vlan_tx_tag_get(skb) << 16) | DescOwn; /* request tagging */ rp->tx_ring[entry].desc_length |= cpu_to_le32(0x02); } else - rp->tx_ring[entry].tx_status = 0; + txstatus = DescOwn; /* lock eth irq */ spin_lock_irqsave(&rp->lock, flags); wmb(); - rp->tx_ring[entry].tx_status |= cpu_to_le32(DescOwn); + rp->tx_ring[entry].tx_status = cpu_to_le32(txstatus); wmb(); rp->cur_tx++; @@ -1634,6 +1646,8 @@ /* find and cleanup dirty tx descriptors */ while (rp->dirty_tx != rp->cur_tx) { txstatus = le32_to_cpu(rp->tx_ring[entry].tx_status); + rmb(); + if (debug > 6) netdev_dbg(dev, "Tx scavenge %d status %08x\n", entry, txstatus); @@ -1652,12 +1666,8 @@ dev->stats.tx_aborted_errors++; if (txstatus & 0x0080) dev->stats.tx_heartbeat_errors++; - if (((rp->quirks & rqRhineI) && txstatus & 0x0002) || - (txstatus & 0x0800) || (txstatus & 0x1000)) { +
Re: [Soekris] net5501 inexplicable crashes under load and using wlan - mystery solved
On Sun, April 8, 2012 07:59, Soren Kristensen wrote: > I still have the problem that nobody running FreeBSD and OpenBSD have > reported similar issues, somebody correct me if I'm wrong. I don't know if it counts, as I use PCI atheros card. But I have: FreeBSD cygnus.apartnet 8.1-RELEASE-p4 FreeBSD 8.1-RELEASE-p4 #0: Tue Sep 13 18:02:33 EDT 2011 root@FreeBSD_8.0_pfSense_2.0-snaps.pfsense.org:/usr/obj./usr/pfSensesrc/src/sys/pfSense_wrap.8.i386 i386 Copyright (c) 1992-2010 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 8.1-RELEASE-p4 #0: Tue Sep 13 18:02:33 EDT 2011 root@FreeBSD_8.0_pfSense_2.0-snaps.pfsense.org:/usr/obj./usr/pfSensesrc/src/sys/pfSense_wrap.8.i386 i386 Timecounter "i8254" frequency 1193182 Hz quality 0 CPU: Geode(TM) Integrated Processor by AMD PCS (499.90-MHz 586-class CPU) Origin = "AuthenticAMD" Id = 0x5a2 Family = 5 Model = a Stepping = 2 Features=0x88a93d AMD Features=0xc040 real memory = 536870912 (512 MB) avail memory = 506286080 (482 MB) and ath0@pci0:0:14:0: class=0x02 card=0x3a131186 chip=0x0013168c rev=0x01 hdr=0x00 class = network subclass = ethernet bar [10] = type Memory, range 32, base 0xa001, size 65536, enabled Dlink DWL-520 card and hifn0 mem 0xa002-0xa0020fff,0xa0022000-0xa0023fff,0xa0028000-0xa002 irq 15 at device 17.0 on pci0 hifn0: [ITHREAD] hifn0: Hifn 7955, rev 0, 32KB dram, pll=0x801 and never had crash for this reason. never tested Linux tough. matheus -- We will call you Cygnus, The God of balance you shall be A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing? http://en.wikipedia.org/wiki/Posting_style ___ Soekris-tech mailing list Soekris-tech@lists.soekris.com http://lists.soekris.com/mailman/listinfo/soekris-tech
Re: [Soekris] net5501 inexplicable crashes under load and using wlan - mystery solved
> Looking though the archieves I found two reported issues, both on Linux. > > 1) The thread in Sept/Oct 2010, concluding with Andrey Safonov reporting > the Linux VIA VT6105M driver to have bug, and how to fix it: > > http://lists.soekris.com/pipermail/soekris-tech/2010-October/016884.html > http://lists.soekris.com/pipermail/soekris-tech/2010-October/016889.html I tried this a long time ago and my box was still crashing. But it could be that its because of the separate wifi issue. > 2) And "green" reporting a fix to either ath9k, or all wireless drivers, > in his post on Jan 25, 2011: > > http://lists.soekris.com/pipermail/soekris-tech/2011-January/017001.html I have also tried the newest possible wifi drivers and still get crashes when using the mini-pci slot. ___ Soekris-tech mailing list Soekris-tech@lists.soekris.com http://lists.soekris.com/mailman/listinfo/soekris-tech
Re: [Soekris] net5501 inexplicable crashes under load and using wlan - mystery solved
On Sun, 08 Apr 2012 22:54:42 +0200 Soren Kristensen wrote: > A quick and easy way to check power supply issues is to lower the TX > power on the wlan card, have you tried that ? Yes. I went as far down with TX power until i hardly got a connection. My subjective impression was that it helps a very tiny bit. Ie i think it gave a little bit of longer time until it crashed. But the difference was below one minute, so i wouldnt trust that. > > Please not that i had to use different antennas, as the net5501 has > > no space to drill holes for the TNC connectors. Why you sell them > > together is beyond me... > > eeh, we don't sell them, Soekris EU, which are a different company, > sells them. Oh. But they use your name. I think you should make clear somewhere that soekris.eu is not a subsidary of soerkis. Otherwise people like me will think that you are selling stuff in europe directly. Attila Kinali -- Why does it take years to find the answers to the questions one should have asked long ago? ___ Soekris-tech mailing list Soekris-tech@lists.soekris.com http://lists.soekris.com/mailman/listinfo/soekris-tech
Re: [Soekris] net5501 inexplicable crashes under load and using wlan - mystery solved
Hi Attila, Sometimes you postings are well thought out, like the one at 12:03 today, but then you do postings like this where it sounds like you have no idea what you're doing Attila Kinali wrote: .. > > Attached is a picture of an DDR SRAM chip mounted at the bottom of > the net5501. For your convenience, i marked the power supply pins red > (for VDD and VDDQ) and the ground pins blue (for the VSS and VSSQ). I > used this, because it's one of the best examples for what i want to > show and it has very little circuitry around that would distract or > make my point less clear. > > The first thing that strikes the eye is, that there are only 4 > capacitors around the chip while it has 8 power supply pins. And even > worse, those 4 capacitors are shared with the two adjacent SRAM > chips. (effectively halfing the number of capacitors "seen" by a > chip) > > The next thing you should notice is, that there isnt a via visible > for each power or ground pin. This suggests that only one via > (underneath the chip) has been used to connect the pin to it's > power/ground plane. You can also spot places where the same via is > used for two pins. .. > That's the theory. > > Now to the practical stuff: > > Because of the "spiky" current consumption of digital logic it has > become custom in the field of electronics to attach an 100nF > capacitor to each power supply pin, to ensure the power supply has a > low inductance and low resistance "power source" for the switching > time. This has been done since at least the 1970s, when the first > 74xx logic family appeared. You can see this still in DIL sockets > sold with integrated 100nF capacitors. The capacitor is connected > directly between a power and a ground pin if possible, to ensure > minimal resistance between the capacitor and the chip. You cannot > group those capacitors together at one pin and just connect the other > pins to the power supply and ground, because the wires and vias will > have a resistance an (more importantly) an inductance that can not be > neglected. For fast digital chips, which have very high frequency > components on the power supply pins, it became custom to connect a > 10nF capacitor directly at the pin and a 100nF adjacent to it. This > is because even those tiny capacitors have an inductance. And due the > internal structure this inductance becomes dominating above the so > called self resonance frequency. This self resonance frequency is > higher for smaller value capacitors, making them better suited for > high frequency applications. The larger capacitor is then used to > provide the energy, while the smaller "eats" the spikes. > > Also, for high current chips like SRAM chips, you generally use a > higher capacitor (somewhere in the range of 1-10uF) adjacent to the > chip, to catch the lower frequency components, or the bumbs so to > speak of, that the 100nF capacitors couldnt catch. The placement of > this capacitor is not so critical as it is "only" for the "low" > frequency components. But it should be still as near to the chip as > possible, and one capacitor per chip. > > Additionally, each power supply and ground pin is connected to their > planes in the middle of the board by two vias. This is done to reduce > the inductance that a via has. Using two vias in parallel halfes the > inductance. > > Ignoring this common engineering practices is generally a bad idea. > It will lead to so called ground bounces, where the local power > supply voltage at the chip decreases, due to inductance and > resistance in the wires/vias to the chip. And even worse: because the > inductance/resistance at the power supply and ground pins is not the > same, the chips voltage level will bounce around wildly depending on > how much current is flowing where. These ground bounces lead at best > to a decreased signal to noise ratio (higher bit error rate) and > intermediatly to bit errors. But in the worst case, it will lead to > the chip entering a improper operating state, where it because > dysfunctional (either not doing anything anymore or doing wild things > it shouldnt do, potentially leading to the destruction of itself or > other chips). > > You also do not share power supply and ground pins of chips, of which > you cannot ensure that they are switching at different times. In this > case, the SRAM chips will switch exactly at the same time, making the > ground bounce problem even worse. Sorry Attila, but you can't really apply design techniques from the TTL era to modern high speed designs with power planes, nowadays it's all a question about two things: 1) Inductance from planes to the chip die itself. 2) The planes's impedance from low to very high frequencies. It doesn't really matter how many capacitors there are close to the chip if the pins are connected to low impedance planes. And you can safely share vias if needed, the reason a DDR DRAM chip have many power pins is to reduce the inductance from plane to chip, on a
Re: [Soekris] net5501 inexplicable crashes under load and using wlan - mystery solved
Hi Attila, Attila Kinali wrote: > > Could be. I think that's unlikely. I rather think that the wlan card > has a spiky power need, probably drawin ~300mA for very short periods. > The VIA6105M should have about the same spikyness, but with less power > consumption. As both share their power supply vias and capacitors, it's > only a matter of time that both spike at the same time, moving one or > both subsystems into a operating condition outside the specs. > >> I would also like to investigate the problem further. Can you please >> tell me the exact wlan card ? > > I bought the following from soekris.eu: > > Complete wireless Bundle 9220-2 > Power Supply, 12V, 3.0A, IEC320-C8 inlet 90V-264V Worldwide > 2.5" SATA hard drive mounting kit for the net5501 > net5501-70 Board and 1 Slot standard Case According to their website then it's a Compex WLM200NX. Those are specified to draw up to 2.5W during TX. Worst case peak should then be 0.75A, assuming the card have reasonable good high speed decoupling, which is way inside the net5501 design specs. A quick and easy way to check power supply issues is to lower the TX power on the wlan card, have you tried that ? > Please not that i had to use different antennas, as the net5501 has > no space to drill holes for the TNC connectors. Why you sell them > together is beyond me... eeh, we don't sell them, Soekris EU, which are a different company, sells them. Best Regards, Soren Kristensen CEO & Chief Engineer Soekris Engineering, Inc. ___ Soekris-tech mailing list Soekris-tech@lists.soekris.com http://lists.soekris.com/mailman/listinfo/soekris-tech
Re: [Soekris] net5501 inexplicable crashes under load and using wlan - mystery solved
2012/4/8 Soren Kristensen : > Hi Attila, > > Attila Kinali wrote: >> On Sun, 08 Apr 2012 01:34:58 +0200 >> Soren Kristensen wrote: >> >>> Alvar Kusma wrote: > As I have stated before, afaik the net5501 do not have any design > issues, Attila's problem is most likely either software related, Please, can you explain, why similar board from PCEngines (Alix 2D13) with same software (OpenWRT image) just works, but Soekris board shows some unstability? Can you explain, why this same exact software works on one net5501 without a glitch over year now, but two other units show unstability signs - random hangs, sometimes works over month, sometimes crashes 2 times a day? This is still a mystery for me. Just bad luck? >>> >>> No, pretty simple: >>> >>> The Linux VT6105M driver has interrupt race problems, reported to have >>> been fixed recently, don't know if it have ported to the main Linux sources. >>> >>> The Atheros wlan drivers seems to also have interrupt race problems, >>> don't remember if that have been fixed too. >> >> You repeat this argument over and over. But apearently, you are the only >> one who knows about these race conditions. I cannot find any reference >> to the race condition on the VT6105M at all. And for the ath9k race, >> the only one i could find was fixed october 2010 in the mainline kernel. >> Can you provide us with references to what race conditions you mean and >> where they are to be found? > > From my post on 12/7/2011: > > Looking though the archieves I found two reported issues, both on Linux. > > 1) The thread in Sept/Oct 2010, concluding with Andrey Safonov reporting > the Linux VIA VT6105M driver to have bug, and how to fix it: > > http://lists.soekris.com/pipermail/soekris-tech/2010-October/016884.html > http://lists.soekris.com/pipermail/soekris-tech/2010-October/016889.html Hey, I tried submitting a patch containing those two lines upstream, resulting in some work from Francois Romieu that fixes it the right way. (See the mail I sent to this list on January 22. requesting help testing, what nobody replied to). Those patches where merged into mainline in linux 3.3-rc1, so version 3.3 and forward contains those fixes, which help fix the interrupt crashes. So for everyone using a kernel below version 3.3 and complaining about crashes, they really should have read their mail and tested with something newer - they had been notified :) /Bjarke > 2) And "green" reporting a fix to either ath9k, or all wireless drivers, > in his post on Jan 25, 2011: > > http://lists.soekris.com/pipermail/soekris-tech/2011-January/017001.html > >> >> But to kill that driver bug argument once and for all please explain the >> following which i've seen during my test: >> >> Setup: >> net5501 running debian/stable with a self build vanilla linux kernel >> version 3.2.1. Connected to the net5501 are a notebook sata harddisk >> and AR9200 wlan card. The LAN is connected on eth0. >> >> If the WLAN card is _not_ running (driver not loaded or disbaled by rfkill) >> no problems can be seen. No crashes, nothing. For months. >> >> Test #1: >> Setup as above. WLAN card enabled, traffic going trough both WLAN and eth0. >> Result: System crashes in 2minutes (+/- 1 minute). No Oops, as would be >> seen with most driver bugs on the serial console. It just hangs. >> >> Test #2: >> Setup and test procedure as in Test #1, but with two 1000uF capacitors >> connected to J5 at 5V and 3.3V power supplies. >> Result: System crashes in 5minutes (-1min, +2min). >> >> Test #3: >> Setup nd test procedure as in Test #2, but with three dozen ceramic >> capacitors >> soldered on the board. >> Result: No crash at all after one week. Even heavy system load doesn't >> affect the system anymore. >> >> >> Notes: >> 1) Test #1 and Test #2 were repeated several dozen times. Although i have not >> writen down the times it takes to crash the system and didnt do a >> mathematically rigourus statistical analysis, i can state that the >> difference between Test #1 and #2 is significant. Ie the additional >> capacitors >> improve the situation considerably. >> >> 2) I run Test #1 and #2 before i did the modifications for Test #3 to ensure >> the bug is still present and can be reproduced. I did not do any software >> upgrades or any configuration changes in between. Ie if it would be a >> software >> bug, it would be present in all three tests. >> >> >> Soren, if you really have an explenation how a software bug (a race >> condition as you say) can be fixed with a soldering iron, i really like >> to hear that. I have systems that experience race conditions under >> every once in a while and i'd like to fix those as well with my soldering >> iron. > > Attila, thanks for the detailed testing done. I agreed with you that > adding capacitors should not change behavior if it's a software problem > alone. > > I will still state that the net5501 has the decoupling it needs for > itself and the
Re: [Soekris] net5501 inexplicable crashes under load and using wlan - mystery solved
On 08/04/12 13:22, Attila Kinali wrote: > Could be. I think that's unlikely. I rather think that the wlan card > has a spiky power need, probably drawin ~300mA for very short periods. > The VIA6105M should have about the same spikyness, but with less power > consumption. As both share their power supply vias and capacitors, it's > only a matter of time that both spike at the same time, moving one or > both subsystems into a operating condition outside the specs. When I had stability problems with my net5501 I solved them by no longer using the first two ethernet ports. I now only use eth2 and eth3 and the box is rock solid. http://lists.soekris.com/pipermail/soekris-tech/2010-December/016953.html As you can see I reported this problem solved in December 2010 (by no longer using the first two Ethernet ports). I've not looked back since; I could test again if anyone's interested. I have an Atheros (ath9k) wireless card: 00:11.0 Network controller: Atheros Communications Inc. AR922X Wireless Network Adapter (rev 01) Subsystem: Atheros Communications Inc. Device 2096 Flags: bus master, 66MHz, medium devsel, latency 168, IRQ 15 Memory at a001 (32-bit, non-prefetchable) [size=64K] Capabilities: [44] Power Management version 2 Kernel driver in use: ath9k Kernel modules: ath9k Currently using kernel 2.6.32-40-generic; I'll be upgrading to Ubuntu 12.04 LTS once it's been out for a while. Jan ___ Soekris-tech mailing list Soekris-tech@lists.soekris.com http://lists.soekris.com/mailman/listinfo/soekris-tech
Re: [Soekris] net5501 inexplicable crashes under load and using wlan - mystery solved
On 04/08/12 13:26, Alvar Kusma wrote: >> I still have the problem that nobody running FreeBSD and OpenBSD have >> reported similar issues, somebody correct me if I'm wrong. > > Just to point one: > > http://lists.soekris.com/pipermail/soekris-tech/2011-December/017988.html You can add another to the list too - FreeBSD 8.x, no WLAN card, just the DP83816 4 port card. Happy to provide more anecdotal evidence if requested. Wish I could offer more. I say it would be nice if Søren and Atilla could collaborate more closely. Relying on watchdogs feels like such a dirty hack. Cmon guys, this is science! :) Regards, Aragon ___ Soekris-tech mailing list Soekris-tech@lists.soekris.com http://lists.soekris.com/mailman/listinfo/soekris-tech
Re: [Soekris] net5501 inexplicable crashes under load and using wlan - mystery solved
On Wed, 4 Apr 2012 11:01:51 +0900 Alan wrote: > On Sun, Apr 1, 2012 at 7:05 PM, Attila Kinali wrote: > > > I finaly got the time to work on this again and got my net5501 > > working without crashes even under heavy load and using wlan at full > > power. At least not within 24h. > > Great that someone is working on this, but 24 hours is not much. If I > recall correctly one of my net5501 could go up to 2 weeks (of light > use) without crashing. Oh.. compared to 2 minutes (!) for an unmodified board or 5 minutes with only two 1000uF electrolytic capacitors connected to J5 it's pretty impressive. Beside, it was last weekend that i did the modifications, so i didn't had the time to let it run for longer. Now i have it running for almost a week and no crashes. > Like others have said, I am eagerly waiting for the description, > schematics and pictures to fix this. I didn't take any pictures of the modified board, but i can tell you what the cause is in more details: The power supply of the net5501 and how it is distributed to the circuitry is disregarding all common good design practices. Hence leading to problems in certain load and use conditions. These problems can be bit errors or complete crashes. Attached is a picture of an DDR SRAM chip mounted at the bottom of the net5501. For your convenience, i marked the power supply pins red (for VDD and VDDQ) and the ground pins blue (for the VSS and VSSQ). I used this, because it's one of the best examples for what i want to show and it has very little circuitry around that would distract or make my point less clear. The first thing that strikes the eye is, that there are only 4 capacitors around the chip while it has 8 power supply pins. And even worse, those 4 capacitors are shared with the two adjacent SRAM chips. (effectively halfing the number of capacitors "seen" by a chip) The next thing you should notice is, that there isnt a via visible for each power or ground pin. This suggests that only one via (underneath the chip) has been used to connect the pin to it's power/ground plane. You can also spot places where the same via is used for two pins. Now, what does that mean? Digital chips are beasts in terms of power supply. Most of the time, they do not draw any power (at least nothing you'd talk about), but when the clock switches from high to low (or low to high, or both), they draw a huge amount of power. One part of that power is used to switch the transistors inside the chip, another part goes into the switching of the output pins. Simplified, you can see the internal circuitry and the output pins as an CMOS inverter [1]. When A changes its logic level, there is first the gate capacitance that has to be charged/discharged. Second, there is a very short period when both transistors are conducting, leading to the so called shot trough current. This current is limited by the current conductance properties of the transistors themselves. For internal circuits it's quite low (they dont have to conduct huge currents), but due to the number of transistors switching at the same time, this cannot be neglected. For the output pins it's a different matter. They are designed to provide large currents (at least 16mA per pin in the DDR SRAM case). So the shot trough current is significant for each pin and the situation becomes worse when multiple pins are switching at the same time. Please keep in mind, that the shot trough current lasts only for a very short period of time, typically less than 1ns. On one hand, this helps, as only little energy is lost by the shot trough. But on the other hand, it leads to very high frequency components. The next big power hog comes from the capacitance connected to the chip. Each pin of a chip case has a capacitance in the order of 1-20pF. (DDR SRAM chips have a pin capacitance of <5pF specified) I.e. you have two pin capacitances (the "sender" and the "receiver" chip) and the capacitance of the wire itself connected to the pin of the chip. Each time an output pin switches high->low or low->high, this capacitance has to be charged/discharged. Ie during this short period an current of approximately of 16mA is flowing trough the pin. (Again: think about multiple pins switching at the same time) That's the theory. Now to the practical stuff: Because of the "spiky" current consumption of digital logic it has become custom in the field of electronics to attach an 100nF capacitor to each power supply pin, to ensure the power supply has a low inductance and low resistance "power source" for the switching time. This has been done since at least the 1970s, when the first 74xx logic family appeared. You can see this still in DIL sockets sold with integrated 100nF capacitors. The capacitor is connected directly between a power and a ground pin if possible, to ensure minimal resistance between the capacitor and the chip. You cannot group those capacitors together at one pin and just connect the other pins to the power suppl
Re: [Soekris] net5501 inexplicable crashes under load and using wlan - mystery solved
Hello, Soren. You wrote 8 апреля 2012 г., 14:59:21: > I still have the problem that nobody running FreeBSD and OpenBSD have > reported similar issues, somebody correct me if I'm wrong. It seems, that I have some interference between eth0 and WiFi in MiniPCI slot, but I've always attributed it to heat or/and damaged eth0 port. I've try to use it with MiniPCI card removed in near future (with it it could not get link from my provider's swithc on long cable, and et1 works without any problems. Also, eth0 works with my home switch with short 1m cable). -- // Black Lion AKA Lev Serebryakov ___ Soekris-tech mailing list Soekris-tech@lists.soekris.com http://lists.soekris.com/mailman/listinfo/soekris-tech
Re: [Soekris] net5501 inexplicable crashes under load and using wlan - mystery solved
Hello, Attila. You wrote 8 апреля 2012 г., 14:10:22: > Do you use eth0? If yes, try not using it. The VT6105M of eth0 shares > its power supply pins and capacitors with the mini PCI interface. > If your crashes are so rare, it might be enough that not using that > interface is enough to get you into the regime that doesnt crash. It sounds interesting. I could not use WiFi in MiniPCI together with eth0 on my net5501 (eth0 could not establish link at all), but I was sure, that it is result of thunderbolt strike, as I've used eth0 for "upstream" provider connection (cable runs directly from provider's switch, which is located under the roof of my 10-store multi-apartment building, when my apartment is on 7th floor)... -- // Black Lion AKA Lev Serebryakov ___ Soekris-tech mailing list Soekris-tech@lists.soekris.com http://lists.soekris.com/mailman/listinfo/soekris-tech
Re: [Soekris] net5501 inexplicable crashes under load and using wlan - mystery solved
> I still have the problem that nobody running FreeBSD and OpenBSD have > reported similar issues, somebody correct me if I'm wrong. Just to point one: http://lists.soekris.com/pipermail/soekris-tech/2011-December/017988.html -- == ___ Soekris-tech mailing list Soekris-tech@lists.soekris.com http://lists.soekris.com/mailman/listinfo/soekris-tech
Re: [Soekris] net5501 inexplicable crashes under load and using wlan - mystery solved
On Sun, 08 Apr 2012 12:59:21 +0200 Soren Kristensen wrote: > Attila Kinali wrote: > > You repeat this argument over and over. But apearently, you are the only > > one who knows about these race conditions. I cannot find any reference > > to the race condition on the VT6105M at all. And for the ath9k race, > > the only one i could find was fixed october 2010 in the mainline kernel. > > Can you provide us with references to what race conditions you mean and > > where they are to be found? > > From my post on 12/7/2011: > > Looking though the archieves I found two reported issues, both on Linux. > > 1) The thread in Sept/Oct 2010, concluding with Andrey Safonov reporting > the Linux VIA VT6105M driver to have bug, and how to fix it: > > http://lists.soekris.com/pipermail/soekris-tech/2010-October/016884.html > http://lists.soekris.com/pipermail/soekris-tech/2010-October/016889.html I cannot tell whether this has been fixed or not. The via-rhine driver irq handling has seen a big overhaul since then. It could be that it's fixed, it could be that it's not. Unfortunately, the reporterd did not mention which kernel version he was using, hence i cannot even check whether the code path is now properly locked or not, as i can only guess what changes he did exactly (a proper patch would have been helpfull) > 2) And "green" reporting a fix to either ath9k, or all wireless drivers, > in his post on Jan 25, 2011: > > http://lists.soekris.com/pipermail/soekris-tech/2011-January/017001.html Here again, no mention of the kernel version that has been used. It could very well be that this is the race condition i found which has been fixed in oct 2010. Please note that "compat-wireless" is the development version of the wifi driver team, released for the benefit of those who do not want to wait for the mainline kernel to pick up the changes they did. As i said, my kernel has the version 3.2.1, which is has been released on Jan 12, 2012. This is considerably newer than any of those reports. I am sure that the ath9k bug has entered mailine kernel since then (they sync up on ever second kernel release at least). And the other one probably too if it has been properly reported to the kernel developers. > > Soren, if you really have an explenation how a software bug (a race > > condition as you say) can be fixed with a soldering iron, i really like > > to hear that. I have systems that experience race conditions under > > every once in a while and i'd like to fix those as well with my soldering > > iron. > > Attila, thanks for the detailed testing done. I agreed with you that > adding capacitors should not change behavior if it's a software problem > alone. Finally you believe me > I will still state that the net5501 has the decoupling it needs for > itself and the expansions it's designed for. One possible sources of > problem could be the power supply regulators as they located just behind > the mini-PCI slot, RF could be affecting t.ex. the compensation circuit, > so adding decoupling capacitors just fix the symptoms. Could be. I think that's unlikely. I rather think that the wlan card has a spiky power need, probably drawin ~300mA for very short periods. The VIA6105M should have about the same spikyness, but with less power consumption. As both share their power supply vias and capacitors, it's only a matter of time that both spike at the same time, moving one or both subsystems into a operating condition outside the specs. > I would also like to investigate the problem further. Can you please > tell me the exact wlan card ? I bought the following from soekris.eu: Complete wireless Bundle 9220-2 Power Supply, 12V, 3.0A, IEC320-C8 inlet 90V-264V Worldwide 2.5" SATA hard drive mounting kit for the net5501 net5501-70 Board and 1 Slot standard Case Please not that i had to use different antennas, as the net5501 has no space to drill holes for the TNC connectors. Why you sell them together is beyond me... > And can you please ensure that the vt6105 driver is updated to a fixed > one, would really love data after that is done As i said, i cannot. But i'm using a pretty recent kernel. If it isn't fixed yet after 2 years... > I still have the problem that nobody running FreeBSD and OpenBSD have > reported similar issues, somebody correct me if I'm wrong. Have you considered that it might be because there are many more people using linux than *bsd? Hence reports with linux are much more likely. Attila Kinali -- Why does it take years to find the answers to the questions one should have asked long ago? ___ Soekris-tech mailing list Soekris-tech@lists.soekris.com http://lists.soekris.com/mailman/listinfo/soekris-tech
Re: [Soekris] net5501 inexplicable crashes under load and using wlan - mystery solved
On Sun, 8 Apr 2012 12:49:17 +0200 Flemming Jacobsen wrote: > Attila Kinali wrote: > > If i had a software problem, why does the behaviour change if > > i connect two 1000uF capacitors to the powersupply pins available > > at J5? And how come the whole issue disapears after soldering three > > dozen of capacitors at various places? > > This screams "My PSU was sub par for the load I gave it". Or > (less likely) "I overloaded the onboard power converter". Unfortunately, even a 90W laptop power supply was sub par for the load a single net5501 gave it... And Soren himself said that the internal power supply is designed for 3.5A on each rail. So that should be more than good enough... > And adding 30+ capacitors to the board at random, speaks volumes > about the level of EE knowledge this individual has. Who said that i added the capacitors at random? You don't have to believe me, but i know what i'm doing[1]. And i added the capacitors there, where i thought they have the most impact on system/power supply stability. Attila Kinali [1] As i've previously stated already, i design boards of the level of the net5501 for a living. I know where the usual problems are and i know how to get around them. -- Why does it take years to find the answers to the questions one should have asked long ago? ___ Soekris-tech mailing list Soekris-tech@lists.soekris.com http://lists.soekris.com/mailman/listinfo/soekris-tech
Re: [Soekris] net5501 inexplicable crashes under load and using wlan - mystery solved
Hi Attila, Attila Kinali wrote: > On Sun, 08 Apr 2012 01:34:58 +0200 > Soren Kristensen wrote: > >> Alvar Kusma wrote: >>> As I have stated before, afaik the net5501 do not have any design issues, Attila's problem is most likely either software related, >>> >>> Please, can you explain, why similar board from PCEngines (Alix 2D13) >>> with same software (OpenWRT image) just works, but Soekris board shows >>> some unstability? Can you explain, why this same exact software works on >>> one net5501 without a glitch over year now, but two other units show >>> unstability signs - random hangs, sometimes works over month, sometimes >>> crashes 2 times a day? This is still a mystery for me. Just bad luck? >> >> No, pretty simple: >> >> The Linux VT6105M driver has interrupt race problems, reported to have >> been fixed recently, don't know if it have ported to the main Linux sources. >> >> The Atheros wlan drivers seems to also have interrupt race problems, >> don't remember if that have been fixed too. > > You repeat this argument over and over. But apearently, you are the only > one who knows about these race conditions. I cannot find any reference > to the race condition on the VT6105M at all. And for the ath9k race, > the only one i could find was fixed october 2010 in the mainline kernel. > Can you provide us with references to what race conditions you mean and > where they are to be found? From my post on 12/7/2011: Looking though the archieves I found two reported issues, both on Linux. 1) The thread in Sept/Oct 2010, concluding with Andrey Safonov reporting the Linux VIA VT6105M driver to have bug, and how to fix it: http://lists.soekris.com/pipermail/soekris-tech/2010-October/016884.html http://lists.soekris.com/pipermail/soekris-tech/2010-October/016889.html 2) And "green" reporting a fix to either ath9k, or all wireless drivers, in his post on Jan 25, 2011: http://lists.soekris.com/pipermail/soekris-tech/2011-January/017001.html > > But to kill that driver bug argument once and for all please explain the > following which i've seen during my test: > > Setup: > net5501 running debian/stable with a self build vanilla linux kernel > version 3.2.1. Connected to the net5501 are a notebook sata harddisk > and AR9200 wlan card. The LAN is connected on eth0. > > If the WLAN card is _not_ running (driver not loaded or disbaled by rfkill) > no problems can be seen. No crashes, nothing. For months. > > Test #1: > Setup as above. WLAN card enabled, traffic going trough both WLAN and eth0. > Result: System crashes in 2minutes (+/- 1 minute). No Oops, as would be > seen with most driver bugs on the serial console. It just hangs. > > Test #2: > Setup and test procedure as in Test #1, but with two 1000uF capacitors > connected to J5 at 5V and 3.3V power supplies. > Result: System crashes in 5minutes (-1min, +2min). > > Test #3: > Setup nd test procedure as in Test #2, but with three dozen ceramic capacitors > soldered on the board. > Result: No crash at all after one week. Even heavy system load doesn't > affect the system anymore. > > > Notes: > 1) Test #1 and Test #2 were repeated several dozen times. Although i have not > writen down the times it takes to crash the system and didnt do a > mathematically rigourus statistical analysis, i can state that the > difference between Test #1 and #2 is significant. Ie the additional capacitors > improve the situation considerably. > > 2) I run Test #1 and #2 before i did the modifications for Test #3 to ensure > the bug is still present and can be reproduced. I did not do any software > upgrades or any configuration changes in between. Ie if it would be a software > bug, it would be present in all three tests. > > > Soren, if you really have an explenation how a software bug (a race > condition as you say) can be fixed with a soldering iron, i really like > to hear that. I have systems that experience race conditions under > every once in a while and i'd like to fix those as well with my soldering > iron. Attila, thanks for the detailed testing done. I agreed with you that adding capacitors should not change behavior if it's a software problem alone. I will still state that the net5501 has the decoupling it needs for itself and the expansions it's designed for. One possible sources of problem could be the power supply regulators as they located just behind the mini-PCI slot, RF could be affecting t.ex. the compensation circuit, so adding decoupling capacitors just fix the symptoms. I would also like to investigate the problem further. Can you please tell me the exact wlan card ? And can you please ensure that the vt6105 driver is updated to a fixed one, would really love data after that is done I still have the problem that nobody running FreeBSD and OpenBSD have reported similar issues, somebody correct me if I'm wrong. Best Regards, Soren Kristensen CEO & Chief Engineer Soekris Engineering, Inc. __
Re: [Soekris] net5501 inexplicable crashes under load and using wlan - mystery solved
> Do you use eth0? If yes, try not using it. The VT6105M of eth0 shares > its power supply pins and capacitors with the mini PCI interface. This is the reason i obtained net5501-s - need for more than 3 ethernet NIC-s on router, _real_ NIC-s, and not those vlan capable 5 port internal "smart switches", found on almost every 70€ SOHO routers. No need for wlan, no need for miniPCI slot... just plain multiethernet router. -- == ___ Soekris-tech mailing list Soekris-tech@lists.soekris.com http://lists.soekris.com/mailman/listinfo/soekris-tech
Re: [Soekris] net5501 inexplicable crashes under load and using wlan - mystery solved
Attila Kinali wrote: > If i had a software problem, why does the behaviour change if > i connect two 1000uF capacitors to the powersupply pins available > at J5? And how come the whole issue disapears after soldering three > dozen of capacitors at various places? This screams "My PSU was sub par for the load I gave it". Or (less likely) "I overloaded the onboard power converter". And adding 30+ capacitors to the board at random, speaks volumes about the level of EE knowledge this individual has. Regards, Flemming -- Flemming Jacobsen Email: f...@batmule.dk "People are more violently opposed to fur than leather because it's safer to harass rich women than motorcycle gangs." -- Unknown ___ Soekris-tech mailing list Soekris-tech@lists.soekris.com http://lists.soekris.com/mailman/listinfo/soekris-tech
Re: [Soekris] net5501 inexplicable crashes under load and using wlan - mystery solved
On Sat, 07 Apr 2012 23:03:43 +0300 Alvar Kusma wrote: > > As I have stated before, afaik the net5501 do not have any design > > issues, Attila's problem is most likely either software related, > > Please, can you explain, why similar board from PCEngines (Alix 2D13) > with same software (OpenWRT image) just works, but Soekris board shows > some unstability? Can you explain, why this same exact software works on > one net5501 without a glitch over year now, but two other units show > unstability signs - random hangs, sometimes works over month, sometimes > crashes 2 times a day? This is still a mystery for me. Just bad luck? Do you use eth0? If yes, try not using it. The VT6105M of eth0 shares its power supply pins and capacitors with the mini PCI interface. If your crashes are so rare, it might be enough that not using that interface is enough to get you into the regime that doesnt crash. Attila Kinali -- Why does it take years to find the answers to the questions one should have asked long ago? ___ Soekris-tech mailing list Soekris-tech@lists.soekris.com http://lists.soekris.com/mailman/listinfo/soekris-tech
Re: [Soekris] net5501 inexplicable crashes under load and using wlan - mystery solved
On Sun, 08 Apr 2012 01:34:58 +0200 Soren Kristensen wrote: > Alvar Kusma wrote: > > > >> As I have stated before, afaik the net5501 do not have any design > >> issues, Attila's problem is most likely either software related, > > > > Please, can you explain, why similar board from PCEngines (Alix 2D13) > > with same software (OpenWRT image) just works, but Soekris board shows > > some unstability? Can you explain, why this same exact software works on > > one net5501 without a glitch over year now, but two other units show > > unstability signs - random hangs, sometimes works over month, sometimes > > crashes 2 times a day? This is still a mystery for me. Just bad luck? > > No, pretty simple: > > The Linux VT6105M driver has interrupt race problems, reported to have > been fixed recently, don't know if it have ported to the main Linux sources. > > The Atheros wlan drivers seems to also have interrupt race problems, > don't remember if that have been fixed too. You repeat this argument over and over. But apearently, you are the only one who knows about these race conditions. I cannot find any reference to the race condition on the VT6105M at all. And for the ath9k race, the only one i could find was fixed october 2010 in the mainline kernel. Can you provide us with references to what race conditions you mean and where they are to be found? But to kill that driver bug argument once and for all please explain the following which i've seen during my test: Setup: net5501 running debian/stable with a self build vanilla linux kernel version 3.2.1. Connected to the net5501 are a notebook sata harddisk and AR9200 wlan card. The LAN is connected on eth0. If the WLAN card is _not_ running (driver not loaded or disbaled by rfkill) no problems can be seen. No crashes, nothing. For months. Test #1: Setup as above. WLAN card enabled, traffic going trough both WLAN and eth0. Result: System crashes in 2minutes (+/- 1 minute). No Oops, as would be seen with most driver bugs on the serial console. It just hangs. Test #2: Setup and test procedure as in Test #1, but with two 1000uF capacitors connected to J5 at 5V and 3.3V power supplies. Result: System crashes in 5minutes (-1min, +2min). Test #3: Setup nd test procedure as in Test #2, but with three dozen ceramic capacitors soldered on the board. Result: No crash at all after one week. Even heavy system load doesn't affect the system anymore. Notes: 1) Test #1 and Test #2 were repeated several dozen times. Although i have not writen down the times it takes to crash the system and didnt do a mathematically rigourus statistical analysis, i can state that the difference between Test #1 and #2 is significant. Ie the additional capacitors improve the situation considerably. 2) I run Test #1 and #2 before i did the modifications for Test #3 to ensure the bug is still present and can be reproduced. I did not do any software upgrades or any configuration changes in between. Ie if it would be a software bug, it would be present in all three tests. Soren, if you really have an explenation how a software bug (a race condition as you say) can be fixed with a soldering iron, i really like to hear that. I have systems that experience race conditions under every once in a while and i'd like to fix those as well with my soldering iron. Attila Kinali -- Why does it take years to find the answers to the questions one should have asked long ago? ___ Soekris-tech mailing list Soekris-tech@lists.soekris.com http://lists.soekris.com/mailman/listinfo/soekris-tech
Re: [Soekris] net5501 inexplicable crashes under load and using wlan - mystery solved
> The net5501 have 4 ethernet ports, 1 Mini-PCI slot and 2 PCI expansion > slots each with 4 interrupt lines. Therefore the 4 interrupt pins on the > Geode LX need to be shared along the possible devices. PCI Interrupt > sharing is part of the PCI specification and should in principle not be > an issue with correct drivers. After looking 2D13 schematics i can confirm, that all 3 VT6105M NIC-s are connected to different INT# pins on CS5536 (INTB#, INTC#, INTD#). MiniPCI slot routed to INTA# and INTB# (shared with eth0). If you count J4 as one net5501 PCI expansion slot, then this is practically unusable for everyone. Is correct to assume, that when unused, then MiniPCI slot doesn't take interrupt line either? My exact case - net5501, all four interfaces in use + Intel EEPro/100 card on PCI slot and that's all. > can vary depending on exact version and usage patterns Yes, i must confess, that this can cover almost everything... -- == ___ Soekris-tech mailing list Soekris-tech@lists.soekris.com http://lists.soekris.com/mailman/listinfo/soekris-tech
Re: [Soekris] net5501 inexplicable crashes under load and using wlan - mystery solved
On Sat, 07 Apr 2012 20:32:41 +0200 Soren Kristensen wrote: > Are not really interested in commenting on Attila, especially as you > seems to have received private emails off list (or the one with photo is > stuck in our spam filter). It is not private and it is not stuck in your spamfilter. I send it to the mailinglist, but apearantly a 30kB picture and a long description what it shows is to big and thus it is stuck in the moderation queue. As the moderator of the mailinglist has not rejected the mail yet, i havent bothered to send it again without the picture. > But of course the net5501 have power and ground planes, that should be > obvious to any engineer, it's actually a 6 layer board. And the memory > have more than enough decoupling capacitors of different sizes, just go > to newegg.com and look at some pictures showing how few capacitors > module manufacturers believe are needed To the contrary, it does not have enough capacitors. > As I have stated before, afaik the net5501 do not have any design > issues, Attila's problem is most likely either software related, wlan > card related, or he might have a defect board (rare, but do of course > happens), which I have offered to replace before. But of course not if > he have ruined it with his experiments If i had a software problem, why does the behaviour change if i connect two 1000uF capacitors to the powersupply pins available at J5? And how come the whole issue disapears after soldering three dozen of capacitors at various places? Attila Kinali -- Why does it take years to find the answers to the questions one should have asked long ago? ___ Soekris-tech mailing list Soekris-tech@lists.soekris.com http://lists.soekris.com/mailman/listinfo/soekris-tech
Re: [Soekris] net5501 inexplicable crashes under load and using wlan - mystery solved
> The Linux VT6105M driver has interrupt race problems, reported to have 2D13 has same ethernet chip. Just FYI - i have 10+ Alix boards, all without this sort of problems. Just used net5501 boards simply because i need 4 or 5 LAN interfaces, Alix is limited to 3 (i don't count usb NIC-s as real alternative). > Afaik, those reporting problems with crashing are all running Linux with > Atheros wlan. Don't have wlan cards at all, except one (ath5k based, with 2D13, in my home). Just preferred AP with dedicated device (WRT54GL or similar), so this is not my case. Anyway, interrupt line explanation makes some sense, but this doesn't explain one of my problem. I tried to swap net5501 units on two different places (CF cards with software swapped between net5501-s) and this unstability symptom "walks" together with unit. Tried different power supplies too. Not a first time mentioned this. -- == ___ Soekris-tech mailing list Soekris-tech@lists.soekris.com http://lists.soekris.com/mailman/listinfo/soekris-tech
Re: [Soekris] net5501 inexplicable crashes under load and using wlan - mystery solved
On Sun, Apr 8, 2012 at 8:34 AM, Soren Kristensen wrote: > Hi Alvar, > > Alvar Kusma wrote: >> >>> As I have stated before, afaik the net5501 do not have any design >>> issues, Attila's problem is most likely either software related, >> >> Please, can you explain, why similar board from PCEngines (Alix 2D13) >> with same software (OpenWRT image) just works, but Soekris board shows >> some unstability? Can you explain, why this same exact software works on >> one net5501 without a glitch over year now, but two other units show >> unstability signs - random hangs, sometimes works over month, sometimes >> crashes 2 times a day? This is still a mystery for me. Just bad luck? > > No, pretty simple: > > The Linux VT6105M driver has interrupt race problems, reported to have > been fixed recently, don't know if it have ported to the main Linux sources. > > The Atheros wlan drivers seems to also have interrupt race problems, > don't remember if that have been fixed too. > > The Alix 3D13 have few expansion options, just 3 ethernet ports and 1 > Mini-PCI slot, and therefore don't need to share the 4 possible > interrupt pins on the Geode LX. > > The net5501 have 4 ethernet ports, 1 Mini-PCI slot and 2 PCI expansion > slots each with 4 interrupt lines. Therefore the 4 interrupt pins on the > Geode LX need to be shared along the possible devices. PCI Interrupt > sharing is part of the PCI specification and should in principle not be > an issue with correct drivers. > > But can easily be an issue on less than perfect drivers with interrupt > race problems and not much testing on slow processors, and can vary > depending on exact version and usage patterns > > Afaik, those reporting problems with crashing are all running Linux with > Atheros wlan. Not saying that you are wrong, but FYI It also happens to me when using a Broadcom wlan card in Linux. ___ Soekris-tech mailing list Soekris-tech@lists.soekris.com http://lists.soekris.com/mailman/listinfo/soekris-tech
Re: [Soekris] net5501 inexplicable crashes under load and using wlan - mystery solved
Hi Alvar, Alvar Kusma wrote: > >> As I have stated before, afaik the net5501 do not have any design >> issues, Attila's problem is most likely either software related, > > Please, can you explain, why similar board from PCEngines (Alix 2D13) > with same software (OpenWRT image) just works, but Soekris board shows > some unstability? Can you explain, why this same exact software works on > one net5501 without a glitch over year now, but two other units show > unstability signs - random hangs, sometimes works over month, sometimes > crashes 2 times a day? This is still a mystery for me. Just bad luck? No, pretty simple: The Linux VT6105M driver has interrupt race problems, reported to have been fixed recently, don't know if it have ported to the main Linux sources. The Atheros wlan drivers seems to also have interrupt race problems, don't remember if that have been fixed too. The Alix 3D13 have few expansion options, just 3 ethernet ports and 1 Mini-PCI slot, and therefore don't need to share the 4 possible interrupt pins on the Geode LX. The net5501 have 4 ethernet ports, 1 Mini-PCI slot and 2 PCI expansion slots each with 4 interrupt lines. Therefore the 4 interrupt pins on the Geode LX need to be shared along the possible devices. PCI Interrupt sharing is part of the PCI specification and should in principle not be an issue with correct drivers. But can easily be an issue on less than perfect drivers with interrupt race problems and not much testing on slow processors, and can vary depending on exact version and usage patterns Afaik, those reporting problems with crashing are all running Linux with Atheros wlan. Best Regards, Soren Kristensen CEO & Chief Engineer Soekris Engineering, Inc. ___ Soekris-tech mailing list Soekris-tech@lists.soekris.com http://lists.soekris.com/mailman/listinfo/soekris-tech
Re: [Soekris] net5501 inexplicable crashes under load and using wlan - mystery solved
> As I have stated before, afaik the net5501 do not have any design > issues, Attila's problem is most likely either software related, Please, can you explain, why similar board from PCEngines (Alix 2D13) with same software (OpenWRT image) just works, but Soekris board shows some unstability? Can you explain, why this same exact software works on one net5501 without a glitch over year now, but two other units show unstability signs - random hangs, sometimes works over month, sometimes crashes 2 times a day? This is still a mystery for me. Just bad luck? Btw, > Are not really interested in commenting on Attila What good gives such statement? :-| -- == ___ Soekris-tech mailing list Soekris-tech@lists.soekris.com http://lists.soekris.com/mailman/listinfo/soekris-tech
Re: [Soekris] net5501 inexplicable crashes under load and using wlan - mystery solved
Hi Jeff, JSL Internet wrote: > Attila, > >Thank you for the detailed post. It is clear that you have some > understanding of best practices regarding digital PCB layout. However, > you've made a number of assumptions and generalizations. Are you > certain that there are no power and ground layers within the net5501 > multi-layer board? I would be surprised if this were true, but I'll > leave it up to Soren to clarify. The photo you posted shows four bypass > capacitors surrounding the RAM chip in very close proximity. It would > be difficult for any layout designer to get them any closer to the > chip. Sharing capacitors is just fine as long as there are separate > runs from the capacitor to each chip. The whole point of the capacitors > is to offset the inductance of the PCB power runs. It is physically > impossible to have zero-length runs from the capacitors to the chips. > The chips have small internal capacitors anyway. Your photo > demonstrates a good PCB design. Are not really interested in commenting on Attila, especially as you seems to have received private emails off list (or the one with photo is stuck in our spam filter). But of course the net5501 have power and ground planes, that should be obvious to any engineer, it's actually a 6 layer board. And the memory have more than enough decoupling capacitors of different sizes, just go to newegg.com and look at some pictures showing how few capacitors module manufacturers believe are needed >If I were having problems with a small SBC (net5501) that only > occurred when it was attached to an RF transmitter (your WLAN card), I > would be looking at the RF susceptibility of the SBC, and the isolation > of the transmission line and transmitting elements from the SBC. It's > doubtful that Soren did any RF susceptibility testing or analysis of the > computer. Most computer manufacturers do not bother with such things. > The metal boxes he sells for the SBCs should take care of most of the > problems anyway. It's entirely likely that the problems that you've > "fixed" are related to the RF susceptibility of the net5501. Small > pieces of carefully applied brass or mu-metal in selected circuit areas > would probably have accomplished the same thing that you did by adding > some capacitors. As I have stated before, afaik the net5501 do not have any design issues, Attila's problem is most likely either software related, wlan card related, or he might have a defect board (rare, but do of course happens), which I have offered to replace before. But of course not if he have ruined it with his experiments Best Regards, Soren Kristensen CEO & Chief Engineer Soekris Engineering, Inc. >Jeff > > On 04/06/2012 07:03 AM, Attila Kinali wrote: >> >> The power supply of the net5501 and how it is distributed to the circuitry >> is disregarding all common good design practices. Hence leading to problems >> in certain load and use conditions. These problems can be bit errors or >> complete crashes. >> > ___ > Soekris-tech mailing list > Soekris-tech@lists.soekris.com > http://lists.soekris.com/mailman/listinfo/soekris-tech ___ Soekris-tech mailing list Soekris-tech@lists.soekris.com http://lists.soekris.com/mailman/listinfo/soekris-tech
Re: [Soekris] net5501 inexplicable crashes under load and using wlan - mystery solved
Attila, Thank you for the detailed post. It is clear that you have some understanding of best practices regarding digital PCB layout. However, you've made a number of assumptions and generalizations. Are you certain that there are no power and ground layers within the net5501 multi-layer board? I would be surprised if this were true, but I'll leave it up to Soren to clarify. The photo you posted shows four bypass capacitors surrounding the RAM chip in very close proximity. It would be difficult for any layout designer to get them any closer to the chip. Sharing capacitors is just fine as long as there are separate runs from the capacitor to each chip. The whole point of the capacitors is to offset the inductance of the PCB power runs. It is physically impossible to have zero-length runs from the capacitors to the chips. The chips have small internal capacitors anyway. Your photo demonstrates a good PCB design. If I were having problems with a small SBC (net5501) that only occurred when it was attached to an RF transmitter (your WLAN card), I would be looking at the RF susceptibility of the SBC, and the isolation of the transmission line and transmitting elements from the SBC. It's doubtful that Soren did any RF susceptibility testing or analysis of the computer. Most computer manufacturers do not bother with such things. The metal boxes he sells for the SBCs should take care of most of the problems anyway. It's entirely likely that the problems that you've "fixed" are related to the RF susceptibility of the net5501. Small pieces of carefully applied brass or mu-metal in selected circuit areas would probably have accomplished the same thing that you did by adding some capacitors. Jeff On 04/06/2012 07:03 AM, Attila Kinali wrote: > > The power supply of the net5501 and how it is distributed to the circuitry > is disregarding all common good design practices. Hence leading to problems > in certain load and use conditions. These problems can be bit errors or > complete crashes. > ___ Soekris-tech mailing list Soekris-tech@lists.soekris.com http://lists.soekris.com/mailman/listinfo/soekris-tech
Re: [Soekris] net5501 inexplicable crashes under load and using wlan - mystery solved
On 4 Apr 2012, Alan told this: > On Sun, Apr 1, 2012 at 7:05 PM, Attila Kinali wrote: > >> I finaly got the time to work on this again and got my net5501 >> working without crashes even under heavy load and using wlan at full >> power. At least not within 24h. > > Great that someone is working on this, but 24 hours is not much. If I > recall correctly one of my net5501 could go up to 2 weeks (of light > use) without crashing. That's not very impressive! Mine is normally up for >70 days (depending on kernel upgrades), and essentially never crashes for non-software reasons. It's not heavily loaded most of the time, but it sees large load spikes sometimes and is doing a lot of network, uh, work. If you're only seeing two weeks between crashes, something is still wrong! > Like others have said, I am eagerly waiting for the description, > schematics and pictures to fix this. And I'm wondering why it hasn't affected me. Perhaps because no wlan is involved, so the power draw is lower? -- NULL && (void) ___ Soekris-tech mailing list Soekris-tech@lists.soekris.com http://lists.soekris.com/mailman/listinfo/soekris-tech
Re: [Soekris] net5501 inexplicable crashes under load and using wlan - mystery solved
On Sun, Apr 1, 2012 at 7:05 PM, Attila Kinali wrote: > I finaly got the time to work on this again and got my net5501 > working without crashes even under heavy load and using wlan at full > power. At least not within 24h. Great that someone is working on this, but 24 hours is not much. If I recall correctly one of my net5501 could go up to 2 weeks (of light use) without crashing. Like others have said, I am eagerly waiting for the description, schematics and pictures to fix this. ___ Soekris-tech mailing list Soekris-tech@lists.soekris.com http://lists.soekris.com/mailman/listinfo/soekris-tech
Re: [Soekris] net5501 inexplicable crashes under load and using wlan - mystery solved (Attila Kinali)
On Tue, 2012-04-03 at 01:27 +0200, Attila Kinali wrote: > On Mon, 02 Apr 2012 09:54:16 -0700 > JSL Internet wrote: > > > You claim to have solved a mystery but you did not provide any > > technical details of the problem or the solution. This is a support > > list but instead of being supportive or asking for support, you are > > being critical of a product that most users do not have any trouble > > with. Without knowing what you did to the net5501 to "fix" it, nobody > > benefits and your credibility suffers. I'm an experienced RF/digital > > engineer and I've seen some "creative" explanations and solutions here > > on the Soekris list. > > My credibility? I've never been called an idiot as often as on > this mailinglist, do you really think i have a credibility to lose? > > If you are an RF/digital engineer, just take a random net5501 board > or a picture of it. Follow the power supply lines and you will see > why some people have crashes and some do not. It is really that obvious! > > How i fixed it is simple: take a hand full of capacitors, sprinkle > them over the board and solder them where they fall. Depending on > the load conditons you will also need to add a couple of wires. > > Attila Kinai I would like to see some pictures (.jpg) of your modified board. Share your expertise with the rest of the list. Bob G ___ Soekris-tech mailing list Soekris-tech@lists.soekris.com http://lists.soekris.com/mailman/listinfo/soekris-tech
Re: [Soekris] net5501 inexplicable crashes under load and using wlan - mystery solved (Attila Kinali)
On Mon, 02 Apr 2012 09:54:16 -0700 JSL Internet wrote: > You claim to have solved a mystery but you did not provide any > technical details of the problem or the solution. This is a support > list but instead of being supportive or asking for support, you are > being critical of a product that most users do not have any trouble > with. Without knowing what you did to the net5501 to "fix" it, nobody > benefits and your credibility suffers. I'm an experienced RF/digital > engineer and I've seen some "creative" explanations and solutions here > on the Soekris list. My credibility? I've never been called an idiot as often as on this mailinglist, do you really think i have a credibility to lose? If you are an RF/digital engineer, just take a random net5501 board or a picture of it. Follow the power supply lines and you will see why some people have crashes and some do not. It is really that obvious! How i fixed it is simple: take a hand full of capacitors, sprinkle them over the board and solder them where they fall. Depending on the load conditons you will also need to add a couple of wires. Attila Kinai -- Why does it take years to find the answers to the questions one should have asked long ago? ___ Soekris-tech mailing list Soekris-tech@lists.soekris.com http://lists.soekris.com/mailman/listinfo/soekris-tech
Re: [Soekris] net5501 inexplicable crashes under load and using wlan - mystery solved (Attila Kinali)
Attila, You claim to have solved a mystery but you did not provide any technical details of the problem or the solution. This is a support list but instead of being supportive or asking for support, you are being critical of a product that most users do not have any trouble with. Without knowing what you did to the net5501 to "fix" it, nobody benefits and your credibility suffers. I'm an experienced RF/digital engineer and I've seen some "creative" explanations and solutions here on the Soekris list. Did you try moving your WLAN card external antenna away from the net5501? Is the coaxial cable or the WLAN card itself defective? Does the combined load of the WLAN card and the net5501 exceed the current limit for the on-board switching power supply? Do you have a defective net5501? You may have soldered some SM capacitors onto your board and "fixed" a problem that is being caused by your WLAN card or your configuration. Jeff Laing soek...@jsl.com ___ Soekris-tech mailing list Soekris-tech@lists.soekris.com http://lists.soekris.com/mailman/listinfo/soekris-tech
Re: [Soekris] net5501 inexplicable crashes under load and using wlan - mystery solved
On 01/04/12 12:05, Attila Kinali wrote: > Can it be fixed? Not really. It's a design issue. The only way to > really fix it is to change the design and redo the boards. > If you already own a net5501, the only thing you can do is, to patch > it up until the crash probability reaches a level where you dont care > anymore. And for this you need a soldering iron and some skill using it, > as it means modifying a small pitch SMD PCB. Depending on how you use > the net5501 this means anything from an hour of soldering to a full day > or two of rework. Care to share? ___ Soekris-tech mailing list Soekris-tech@lists.soekris.com http://lists.soekris.com/mailman/listinfo/soekris-tech