On Wed, 4 Apr 2012 11:01:51 +0900 Alan <[email protected]> wrote:
> On Sun, Apr 1, 2012 at 7:05 PM, Attila Kinali <[email protected]> wrote: > > > I finaly got the time to work on this again and got my net5501 > > working without crashes even under heavy load and using wlan at full > > power. At least not within 24h. > > Great that someone is working on this, but 24 hours is not much. If I > recall correctly one of my net5501 could go up to 2 weeks (of light > use) without crashing. Oh.. compared to 2 minutes (!) for an unmodified board or 5 minutes with only two 1000uF electrolytic capacitors connected to J5 it's pretty impressive. Beside, it was last weekend that i did the modifications, so i didn't had the time to let it run for longer. Now i have it running for almost a week and no crashes. > Like others have said, I am eagerly waiting for the description, > schematics and pictures to fix this. I didn't take any pictures of the modified board, but i can tell you what the cause is in more details: <summary> The power supply of the net5501 and how it is distributed to the circuitry is disregarding all common good design practices. Hence leading to problems in certain load and use conditions. These problems can be bit errors or complete crashes. </summary> Attached is a picture of an DDR SRAM chip mounted at the bottom of the net5501. For your convenience, i marked the power supply pins red (for VDD and VDDQ) and the ground pins blue (for the VSS and VSSQ). I used this, because it's one of the best examples for what i want to show and it has very little circuitry around that would distract or make my point less clear. The first thing that strikes the eye is, that there are only 4 capacitors around the chip while it has 8 power supply pins. And even worse, those 4 capacitors are shared with the two adjacent SRAM chips. (effectively halfing the number of capacitors "seen" by a chip) The next thing you should notice is, that there isnt a via visible for each power or ground pin. This suggests that only one via (underneath the chip) has been used to connect the pin to it's power/ground plane. You can also spot places where the same via is used for two pins. Now, what does that mean? Digital chips are beasts in terms of power supply. Most of the time, they do not draw any power (at least nothing you'd talk about), but when the clock switches from high to low (or low to high, or both), they draw a huge amount of power. One part of that power is used to switch the transistors inside the chip, another part goes into the switching of the output pins. Simplified, you can see the internal circuitry and the output pins as an CMOS inverter [1]. When A changes its logic level, there is first the gate capacitance that has to be charged/discharged. Second, there is a very short period when both transistors are conducting, leading to the so called shot trough current. This current is limited by the current conductance properties of the transistors themselves. For internal circuits it's quite low (they dont have to conduct huge currents), but due to the number of transistors switching at the same time, this cannot be neglected. For the output pins it's a different matter. They are designed to provide large currents (at least 16mA per pin in the DDR SRAM case). So the shot trough current is significant for each pin and the situation becomes worse when multiple pins are switching at the same time. Please keep in mind, that the shot trough current lasts only for a very short period of time, typically less than 1ns. On one hand, this helps, as only little energy is lost by the shot trough. But on the other hand, it leads to very high frequency components. The next big power hog comes from the capacitance connected to the chip. Each pin of a chip case has a capacitance in the order of 1-20pF. (DDR SRAM chips have a pin capacitance of <5pF specified) I.e. you have two pin capacitances (the "sender" and the "receiver" chip) and the capacitance of the wire itself connected to the pin of the chip. Each time an output pin switches high->low or low->high, this capacitance has to be charged/discharged. Ie during this short period an current of approximately of 16mA is flowing trough the pin. (Again: think about multiple pins switching at the same time) That's the theory. Now to the practical stuff: Because of the "spiky" current consumption of digital logic it has become custom in the field of electronics to attach an 100nF capacitor to each power supply pin, to ensure the power supply has a low inductance and low resistance "power source" for the switching time. This has been done since at least the 1970s, when the first 74xx logic family appeared. You can see this still in DIL sockets sold with integrated 100nF capacitors. The capacitor is connected directly between a power and a ground pin if possible, to ensure minimal resistance between the capacitor and the chip. You cannot group those capacitors together at one pin and just connect the other pins to the power supply and ground, because the wires and vias will have a resistance an (more importantly) an inductance that can not be neglected. For fast digital chips, which have very high frequency components on the power supply pins, it became custom to connect a 10nF capacitor directly at the pin and a 100nF adjacent to it. This is because even those tiny capacitors have an inductance. And due the internal structure this inductance becomes dominating above the so called self resonance frequency. This self resonance frequency is higher for smaller value capacitors, making them better suited for high frequency applications. The larger capacitor is then used to provide the energy, while the smaller "eats" the spikes. Also, for high current chips like SRAM chips, you generally use a higher capacitor (somewhere in the range of 1-10uF) adjacent to the chip, to catch the lower frequency components, or the bumbs so to speak of, that the 100nF capacitors couldnt catch. The placement of this capacitor is not so critical as it is "only" for the "low" frequency components. But it should be still as near to the chip as possible, and one capacitor per chip. Additionally, each power supply and ground pin is connected to their planes in the middle of the board by two vias. This is done to reduce the inductance that a via has. Using two vias in parallel halfes the inductance. Ignoring this common engineering practices is generally a bad idea. It will lead to so called ground bounces, where the local power supply voltage at the chip decreases, due to inductance and resistance in the wires/vias to the chip. And even worse: because the inductance/resistance at the power supply and ground pins is not the same, the chips voltage level will bounce around wildly depending on how much current is flowing where. These ground bounces lead at best to a decreased signal to noise ratio (higher bit error rate) and intermediatly to bit errors. But in the worst case, it will lead to the chip entering a improper operating state, where it because dysfunctional (either not doing anything anymore or doing wild things it shouldnt do, potentially leading to the destruction of itself or other chips). You also do not share power supply and ground pins of chips, of which you cannot ensure that they are switching at different times. In this case, the SRAM chips will switch exactly at the same time, making the ground bounce problem even worse. There is a way to mitigate this problem a little bit, at least the part of the problem that is caused by the output pin wire capacitance. If you put a resistor (usually 10-30 Ohm) into the wire, you "insulate" the capacitance at the down stream part from the output pin, forming an R-C circuit. The R limits the amount of current flowing into the capacitance downstream of the resistor. The main disadvantage of this is that the switching time is increased by the R-C time constant. A second disadvantage is, that you add two additional pin capacitances (the one of the resistor) to the system. Over all, this technique helps only if the capacitance of the wire or of the "recipient" chips pin is significantly higher than the "sender" chips pin and the part of the wire between the "sender" chip and the resistor. You can see those resistors on the net5501 as small resistor networks between the SRAM and the Geode chips. Please note that: the resistors are on a short wire, hence the capacitance of the wire is most likely below 10pF, probably in the range of 2-5pF. Also note that the Geode chip has probably similar pin capacitance characteristics as the SRAM chips. What is really interesting here though, is that the top side and bottom side SRAM chips have resistors of different values. The top side is 33 Ohm while the bottom side is 22 Ohm. This is very unusual, as normally you'd chose the same for all resistors, because the wires are usually routed to have the same properties (same length, same capacitance, same inductance). I can only guess that Soekris might have had problems with the SRAM ground bounce and thus increased the resistor size on the top to mitigate this. You can see this two problems of not heaving enough capcacitors and the sharing of power supply pins between chips troughout the board. Actually, i found it everywhere i cared to check. But why doesn't it lead to crashes for all users, but only for some? Well, electronic circuits are not ideal. And not every part is the same as an other. Capacitors are usually rated +/-10%. More precise are getting very expensive very quickly. For high volume production you often use +/-20% because they are significantly cheaper. The same applies also to digital logic. The voltage levels when a digital circuit switches changes from chip to chip... it even changes from transistor to transistor within the chip (+/-10% within a chip is kind of normal). This also affects the signal to noise margin a digital circuit has. Meaning that some systems will have a higher suceptibility to noise than others. Usually this suceptibility is so low that you dont care about it (aka a flipped bit every few years). But if the design is driven at its limits (what ever the cause may be), then this suceptibility rises dramatically and you see these "occasional", inexplicable crashes. In the case of the net5501 and its ignoring common design principles, even normal use (like inserting a wlan card) might drive it into this crash regime. As i said in my previous mail, there is no real way to fix it. You cannot wire a capacitor where one is missing, because there is no space. You cannot make lower inductance power supply and ground connections where one isn't, because you cannot access the inner planes where these are distributed. The only thing you can do is solder a few capacitors on top of the ones that exist and solder wires to decrease the inductance ever so slightly. The extend you have to do this depends on how exactly you use your net5501 and what part of the circuit causes the crash. As i said, it can be anything from an hour of soldering to a rework of the board that takes a day or two. As you can tell, i'm quite pissed at all this, because Soren personally made it a few times clear that he thinks that his design is flawless, calling the power supply "rock solid". And in sometimes roundabout ways, sometimes quite direct telling me that i'm an idiot looking for hardware problems. And consider that the net5501 is a very expensive board, it costs 220USD, which is twice what PC-Engines wants for their Alix boards (and you cannot tell me that a Swiss company has lower labor costs than an US company or that it has lower quality. If anything, the labor and production costs in Switzerland are higher). I would have thought, that at these prices, one could expect a proper design, with all due diligence. Also, if you check the archives of this mailinglist, you will see that crashes like i had have been reported repeately over the years. To some reports even Soren replied directly. So, it is _not_ true, that they have not been aware that there might be issues with the net5501. They just ignored all reports and marked them as user errors. Attila Kinali [1] http://en.wikipedia.org/wiki/File:CMOS_Inverter.svg [2] http://www.eetimes.com/electronics-news/4196917/Ground-Bounce-Primer [3] http://www.fairchildsemi.com/an/AN/AN-640.pdf -- Why does it take years to find the answers to the questions one should have asked long ago?
<<attachment: sram.jpg>>
_______________________________________________ Soekris-tech mailing list [email protected] http://lists.soekris.com/mailman/listinfo/soekris-tech
