Re: [Soekris] net5501 inexplicable crashes under load and using wlan - mystery solved

2012-04-17 Thread Attila Kinali
On Sat, 07 Apr 2012 09:27:53 -0700
JSL Internet  wrote:


>   Thank you for the detailed post.  It is clear that you have some 
> understanding of best practices regarding digital PCB layout.  However, 
> you've made a number of assumptions and generalizations.  Are you 
> certain that there are no power and ground layers within the net5501 
> multi-layer board?  I would be surprised if this were true, but I'll 
> leave it up to Soren to clarify.

Uhm? I've never said that there are no power planes. To the contrary
i i'm pretty sure the net5501 has power planes. If you look at the
board, sepecially at the layer beneath the surface, you dont see any
tracks, actually you dont see any structure. This is a 99% sure sign
that you are looking at a plane. Of course, something could block the
view down to the next layer and give the impression of a plane, but
that's very very unlikely

>  The photo you posted shows four bypass 
> capacitors surrounding the RAM chip in very close proximity.  It would 
> be difficult for any layout designer to get them any closer to the 
> chip.  Sharing capacitors is just fine as long as there are separate 
> runs from the capacitor to each chip.  The whole point of the capacitors 
> is to offset the inductance of the PCB power runs.  It is physically 
> impossible to have zero-length runs from the capacitors to the chips.  
> The chips have small internal capacitors anyway.  Your photo 
> demonstrates a good PCB design.

I did not say that the capacitors are far away. They are near enough.
But i critisized that there are not enough of them. There are several
power supply pins per chip for a reason. And you are advised to put
a blocking capacitor at each of them. And the board would have had enough
space for more capacitors. Hence, i say that the board shows clearly
_bad_ PCB design.


>   If I were having problems with a small SBC (net5501) that only 
> occurred when it was attached to an RF transmitter (your WLAN card), I 
> would be looking at the RF susceptibility of the SBC, and the isolation 
> of the transmission line and transmitting elements from the SBC.  It's 
> doubtful that Soren did any RF susceptibility testing or analysis of the 
> computer.  Most computer manufacturers do not bother with such things.  
> The metal boxes he sells for the SBCs should take care of most of the 
> problems anyway.  It's entirely likely that the problems that you've 
> "fixed" are related to the RF susceptibility of the net5501.  Small 
> pieces of carefully applied brass or mu-metal in selected circuit areas 
> would probably have accomplished the same thing that you did by adding 
> some capacitors.

No, RF suceptibility looks different. Beside, if the board would be
suceptible for RF emissions from a WLAN card with which it is sold
on the soekris website, i would say that something is very wrong.
But lets focus on technical arguments: As you said, you are an experienced
RF and digital engineer, then you know for sure that wires that run
unprotected by a nearby ground plane and/or run in large loops are
susceptible for EMI. I cannot see any such wires/tracks on the board
at all. All run straight and quite short. Beside, the signals are all digital
which means that the field strength to cause problems would have to be
large. Larger than a hermetically shielded RF module can produce. 

And what really destroys that argument is, that soldering capacitors
at the power supply pins fixes the crashes. Power supply is something
very low impedance, to even get a microvolt of induced voltage into
such a wire (which runs a couple of mm from the plane up and to the pin),
you need a multiple watts of directed signal. That's something you dont
have in such a system.

Attila Kinali
-- 
Why does it take years to find the answers to
the questions one should have asked long ago?
___
Soekris-tech mailing list
Soekris-tech@lists.soekris.com
http://lists.soekris.com/mailman/listinfo/soekris-tech


Re: [Soekris] net5501 inexplicable crashes under load and using wlan - mystery solved

2012-04-11 Thread Bjarke Istrup Pedersen
2012/4/11 Svenning Sørensen :
>
>
> On 10-04-2012 15:12, Jan Ceuleers wrote:
>> Svenning Sørensen wrote:
>>> Two machines sending high rates of UDP packets to each other would make
>>> the box hang rather fast, often within a few seconds under certain
>>> loads.
>>> I made a few fixes to the 3.2.13 driver which seem to have solved the
>>> hangs as well as another major problem (the one addressed in 3.3),
>>> namely missed timer ticks due to too much work in the via-rhine
>>> interrupt handler.
>>> It has now been running for a week under high load without hanging at
>>> all.
>>
>> Svenning,
>>
>> Would you consider submitting this to netdev once you believe the
>> problem is solved?
>>
>> Thanks, Jan
>
> Actually it seems they've already spotted it on the netdev list :)

I would really have wished people had picked it up back in january and
helped catch some of the bugs there, so we won't have to wait for 3.4
before it becomes really usable :o)

If people could try and give v3.4-rc2 with this patch applied a run,
it would be really helpful: http://patchwork.ozlabs.org/patch/151700/
So far it seems stable to me, but my setup most likely won't catch
every problem.

/Bjarke

> Svenning
>
>
> ___
> Soekris-tech mailing list
> Soekris-tech@lists.soekris.com
> http://lists.soekris.com/mailman/listinfo/soekris-tech
___
Soekris-tech mailing list
Soekris-tech@lists.soekris.com
http://lists.soekris.com/mailman/listinfo/soekris-tech


Re: [Soekris] net5501 inexplicable crashes under load and using wlan - mystery solved

2012-04-11 Thread Svenning Sørensen


On 10-04-2012 15:12, Jan Ceuleers wrote:
> Svenning Sørensen wrote:
>> Two machines sending high rates of UDP packets to each other would make
>> the box hang rather fast, often within a few seconds under certain 
>> loads.
>> I made a few fixes to the 3.2.13 driver which seem to have solved the
>> hangs as well as another major problem (the one addressed in 3.3),
>> namely missed timer ticks due to too much work in the via-rhine
>> interrupt handler.
>> It has now been running for a week under high load without hanging at 
>> all.
>
> Svenning,
>
> Would you consider submitting this to netdev once you believe the 
> problem is solved?
>
> Thanks, Jan

Actually it seems they've already spotted it on the netdev list :)

Svenning



___
Soekris-tech mailing list
Soekris-tech@lists.soekris.com
http://lists.soekris.com/mailman/listinfo/soekris-tech


Re: [Soekris] net5501 inexplicable crashes under load and using wlan - mystery solved

2012-04-10 Thread Jan Ceuleers
Svenning Sørensen wrote:
> Two machines sending high rates of UDP packets to each other would make
> the box hang rather fast, often within a few seconds under certain loads.
> I made a few fixes to the 3.2.13 driver which seem to have solved the
> hangs as well as another major problem (the one addressed in 3.3),
> namely missed timer ticks due to too much work in the via-rhine
> interrupt handler.
> It has now been running for a week under high load without hanging at all.

Svenning,

Would you consider submitting this to netdev once you believe the 
problem is solved?

Thanks, Jan
___
Soekris-tech mailing list
Soekris-tech@lists.soekris.com
http://lists.soekris.com/mailman/listinfo/soekris-tech


Re: [Soekris] net5501 inexplicable crashes under load and using wlan - mystery solved

2012-04-10 Thread Svenning Sørensen


On 08-04-2012 22:54, Soren Kristensen wrote:

And it is a fact that the Linux VT6105M driver had bugs, reported to be
fixed in Linux 3.3. And that the atheros and/or wlan drivers had/have
bugs, don't know if fixed.

Opgrade to Linux>= 3.3 and we can continue talking.


I have observed the via-rhine bugs as well - not only on net5501, but 
also on alix boards.


While some of these bugs have been fixed in Linux 3.3, the most critical 
one (random lockup) is still there (and new bugs have sneaked in, such 
as random loss of link).


Two machines sending high rates of UDP packets to each other would make 
the box hang rather fast, often within a few seconds under certain loads.
I made a few fixes to the 3.2.13 driver which seem to have solved the 
hangs as well as another major problem (the one addressed in 3.3), 
namely missed timer ticks due to too much work in the via-rhine 
interrupt handler.

It has now been running for a week under high load without hanging at all.

I've attached a patch, but I don't know if the list server will allow it..

An *important* thing to be aware of is that it doesn't work reliably 
unless MMIO mode (CONFIG_VIA_RHINE_MMIO) is disabled!
I don't know if that is a driver issue or a hardware issue (could be the 
VT6105M chip itself or the PCI bridge inside the Geode causing trouble 
with posted writes or write combining or some such), but in any case I 
don't think anyone could blame Søren for this.


Svenning


--- via-rhine.c.orig2012-03-23 21:54:45.0 +0100
+++ via-rhine.c 2012-04-10 12:22:15.0 +0200
@@ -76,7 +76,8 @@
There are no ill effects from too-large receive rings. */
 #define TX_RING_SIZE   16
 #define TX_QUEUE_LEN   10  /* Limit ring entries actually used. */
-#define RX_RING_SIZE   64
+#define RX_RING_SIZE   16
+#define RX_NAPI_WEIGHT 8
 
 /* Operational parameters that usually are not changed. */
 
@@ -781,6 +782,7 @@
pioaddr = pci_resource_start(pdev, 0);
memaddr = pci_resource_start(pdev, 1);
 
+   pci_set_mwi(pdev);
pci_set_master(pdev);
 
dev = alloc_etherdev(sizeof(struct rhine_private));
@@ -868,7 +870,7 @@
dev->ethtool_ops = &netdev_ethtool_ops,
dev->watchdog_timeo = TX_TIMEOUT;
 
-   netif_napi_add(dev, &rp->napi, rhine_napipoll, 64);
+   netif_napi_add(dev, &rp->napi, rhine_napipoll, RX_NAPI_WEIGHT);
 
if (rp->quirks & rqRhineI)
dev->features |= NETIF_F_SG|NETIF_F_HW_CSUM;
@@ -1019,6 +1021,7 @@
   PCI_DMA_FROMDEVICE);
 
rp->rx_ring[i].addr = cpu_to_le32(rp->rx_skbuff_dma[i]);
+   wmb();
rp->rx_ring[i].rx_status = cpu_to_le32(DescOwn);
}
rp->dirty_rx = (unsigned int)(i - RX_RING_SIZE);
@@ -1475,6 +1478,7 @@
void __iomem *ioaddr = rp->base;
unsigned entry;
unsigned long flags;
+   int txstatus;
 
/* Caution: the write order is important here, set the field
   with the "ownership" bits last. */
@@ -1485,6 +1489,14 @@
if (skb_padto(skb, ETH_ZLEN))
return NETDEV_TX_OK;
 
+   IOSYNC;
+   txstatus = le32_to_cpu(rp->tx_ring[entry].tx_status);
+   rmb();
+   if (unlikely(txstatus & DescOwn)) {
+   netdev_warn(dev, "Tx descriptor busy\n");
+   return NETDEV_TX_BUSY;
+   }
+
rp->tx_skbuff[entry] = skb;
 
if ((rp->quirks & rqRhineI) &&
@@ -1518,17 +1530,17 @@
cpu_to_le32(TXDESC | (skb->len >= ETH_ZLEN ? skb->len : 
ETH_ZLEN));
 
if (unlikely(vlan_tx_tag_present(skb))) {
-   rp->tx_ring[entry].tx_status = 
cpu_to_le32((vlan_tx_tag_get(skb)) << 16);
+   txstatus = (vlan_tx_tag_get(skb) << 16) | DescOwn;
/* request tagging */
rp->tx_ring[entry].desc_length |= cpu_to_le32(0x02);
}
else
-   rp->tx_ring[entry].tx_status = 0;
+   txstatus = DescOwn;
 
/* lock eth irq */
spin_lock_irqsave(&rp->lock, flags);
wmb();
-   rp->tx_ring[entry].tx_status |= cpu_to_le32(DescOwn);
+   rp->tx_ring[entry].tx_status = cpu_to_le32(txstatus);
wmb();
 
rp->cur_tx++;
@@ -1634,6 +1646,8 @@
/* find and cleanup dirty tx descriptors */
while (rp->dirty_tx != rp->cur_tx) {
txstatus = le32_to_cpu(rp->tx_ring[entry].tx_status);
+   rmb();
+
if (debug > 6)
netdev_dbg(dev, "Tx scavenge %d status %08x\n",
   entry, txstatus);
@@ -1652,12 +1666,8 @@
dev->stats.tx_aborted_errors++;
if (txstatus & 0x0080)
dev->stats.tx_heartbeat_errors++;
-   if (((rp->quirks & rqRhineI) && txstatus & 0x0002) ||
-   (txstatus & 0x0800) || (txstatus & 0x1000)) {
+   

Re: [Soekris] net5501 inexplicable crashes under load and using wlan - mystery solved

2012-04-08 Thread Nenhum_de_Nos

On Sun, April 8, 2012 07:59, Soren Kristensen wrote:
> I still have the problem that nobody running FreeBSD and OpenBSD have
> reported similar issues, somebody correct me if I'm wrong.

I don't know if it counts, as I use PCI atheros card. But I have:

FreeBSD cygnus.apartnet 8.1-RELEASE-p4 FreeBSD 8.1-RELEASE-p4 #0: Tue Sep 13 
18:02:33 EDT 2011
root@FreeBSD_8.0_pfSense_2.0-snaps.pfsense.org:/usr/obj./usr/pfSensesrc/src/sys/pfSense_wrap.8.i386
 i386

Copyright (c) 1992-2010 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 8.1-RELEASE-p4 #0: Tue Sep 13 18:02:33 EDT 2011

root@FreeBSD_8.0_pfSense_2.0-snaps.pfsense.org:/usr/obj./usr/pfSensesrc/src/sys/pfSense_wrap.8.i386
i386
Timecounter "i8254" frequency 1193182 Hz quality 0
CPU: Geode(TM) Integrated Processor by AMD PCS (499.90-MHz 586-class CPU)
  Origin = "AuthenticAMD"  Id = 0x5a2  Family = 5  Model = a  Stepping = 2
  Features=0x88a93d
  AMD Features=0xc040
real memory  = 536870912 (512 MB)
avail memory = 506286080 (482 MB)

and

ath0@pci0:0:14:0:   class=0x02 card=0x3a131186 chip=0x0013168c rev=0x01 
hdr=0x00
class  = network
subclass   = ethernet
bar   [10] = type Memory, range 32, base 0xa001, size 65536, enabled

Dlink DWL-520 card

and

hifn0 mem 0xa002-0xa0020fff,0xa0022000-0xa0023fff,0xa0028000-0xa002 irq 
15 at device 17.0
on pci0
hifn0: [ITHREAD]
hifn0: Hifn 7955, rev 0, 32KB dram, pll=0x801

and never had crash for this reason.

never tested Linux tough.

matheus


-- 
We will call you Cygnus,
The God of balance you shall be

A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?

http://en.wikipedia.org/wiki/Posting_style
___
Soekris-tech mailing list
Soekris-tech@lists.soekris.com
http://lists.soekris.com/mailman/listinfo/soekris-tech


Re: [Soekris] net5501 inexplicable crashes under load and using wlan - mystery solved

2012-04-08 Thread Alan
> Looking though the archieves I found two reported issues, both on Linux.
>
> 1) The thread in Sept/Oct 2010, concluding with Andrey Safonov reporting
> the Linux VIA VT6105M driver to have bug, and how to fix it:
>
> http://lists.soekris.com/pipermail/soekris-tech/2010-October/016884.html
> http://lists.soekris.com/pipermail/soekris-tech/2010-October/016889.html

I tried this a long time ago and my box was still crashing. But it
could be that its because of the separate wifi issue.

> 2) And "green" reporting a fix to either ath9k, or all wireless drivers,
> in his post on Jan 25, 2011:
>
> http://lists.soekris.com/pipermail/soekris-tech/2011-January/017001.html

I have also tried the newest possible wifi drivers and still get
crashes when using the mini-pci slot.
___
Soekris-tech mailing list
Soekris-tech@lists.soekris.com
http://lists.soekris.com/mailman/listinfo/soekris-tech


Re: [Soekris] net5501 inexplicable crashes under load and using wlan - mystery solved

2012-04-08 Thread Attila Kinali
On Sun, 08 Apr 2012 22:54:42 +0200
Soren Kristensen  wrote:


> A quick and easy way to check power supply issues is to lower the TX 
> power on the wlan card, have you tried that ?

Yes. I went as far down with TX power until i hardly got a connection.
My subjective impression was that it helps a very tiny bit.
Ie i think it gave a little bit of longer time until it crashed.
But the difference was below one minute, so i wouldnt trust that.
 
> > Please not that i had to use different antennas, as the net5501 has
> > no space to drill holes for the TNC connectors. Why you sell them
> > together is beyond me...
> 
> eeh, we don't sell them, Soekris EU, which are a different company, 
> sells them.

Oh. But they use your name. I think you should make clear somewhere
that soekris.eu is not a subsidary of soerkis. Otherwise people like
me will think that you are selling stuff in europe directly.

Attila Kinali 


-- 
Why does it take years to find the answers to
the questions one should have asked long ago?
___
Soekris-tech mailing list
Soekris-tech@lists.soekris.com
http://lists.soekris.com/mailman/listinfo/soekris-tech


Re: [Soekris] net5501 inexplicable crashes under load and using wlan - mystery solved

2012-04-08 Thread Soren Kristensen
Hi Attila,

Sometimes you postings are well thought out, like the one at 12:03 
today, but then you do postings like this where it sounds like you have 
no idea what you're doing

Attila Kinali wrote:

..
>
> Attached is a picture of an DDR SRAM chip mounted at the bottom of
> the net5501. For your convenience, i marked the power supply pins red
> (for VDD and VDDQ) and the ground pins blue (for the VSS and VSSQ). I
> used this, because it's one of the best examples for what i want to
> show and it has very little circuitry around that would distract or
> make my point less clear.
>
> The first thing that strikes the eye is, that there are only 4
> capacitors around the chip while it has 8 power supply pins. And even
> worse, those 4 capacitors are shared with the two adjacent SRAM
> chips. (effectively halfing the number of capacitors "seen" by a
> chip)
>
> The next thing you should notice is, that there isnt a via visible
> for each power or ground pin. This suggests that only one via
> (underneath the chip) has been used to connect the pin to it's
> power/ground plane. You can also spot places where the same via is
> used for two pins.

..

> That's the theory.
>
> Now to the practical stuff:
>
> Because of the "spiky" current consumption of digital logic it has
> become custom in the field of electronics to attach an 100nF
> capacitor to each power supply pin, to ensure the power supply has a
> low inductance and low resistance "power source" for the switching
> time. This has been done since at least the 1970s, when the first
> 74xx logic family appeared. You can see this still in DIL sockets
> sold with integrated 100nF capacitors. The capacitor is connected
> directly between a power and a ground pin if possible, to ensure
> minimal resistance between the capacitor and the chip. You cannot
> group those capacitors together at one pin and just connect the other
> pins to the power supply and ground, because the wires and vias will
> have a resistance an (more importantly) an inductance that can not be
> neglected. For fast digital chips, which have very high frequency
> components on the power supply pins, it became custom to connect a
> 10nF capacitor directly at the pin and a 100nF adjacent to it. This
> is because even those tiny capacitors have an inductance. And due the
> internal structure this inductance becomes dominating above the so
> called self resonance frequency. This self resonance frequency is
> higher for smaller value capacitors, making them better suited for
> high frequency applications. The larger capacitor is then used to
> provide the energy, while the smaller "eats" the spikes.
>
> Also, for high current chips like SRAM chips, you generally use a
> higher capacitor (somewhere in the range of 1-10uF) adjacent to the
> chip, to catch the lower frequency components, or the bumbs so to
> speak of, that the 100nF capacitors couldnt catch. The placement of
> this capacitor is not so critical as it is "only" for the "low"
> frequency components. But it should be still as near to the chip as
> possible, and one capacitor per chip.
>
> Additionally, each power supply and ground pin is connected to their
> planes in the middle of the board by two vias. This is done to reduce
> the inductance that a via has. Using two vias in parallel halfes the
> inductance.
>
> Ignoring this common engineering practices is generally a bad idea.
> It will lead to so called ground bounces, where the local power
> supply voltage at the chip decreases, due to inductance and
> resistance in the wires/vias to the chip. And even worse: because the
> inductance/resistance at the power supply and ground pins is not the
> same, the chips voltage level will bounce around wildly depending on
> how much current is flowing where. These ground bounces lead at best
> to a decreased signal to noise ratio (higher bit error rate) and
> intermediatly to bit errors. But in the worst case, it will lead to
> the chip entering a improper operating state, where it because
> dysfunctional (either not doing anything anymore or doing wild things
> it shouldnt do, potentially leading to the destruction of itself or
> other chips).
>
> You also do not share power supply and ground pins of chips, of which
> you cannot ensure that they are switching at different times. In this
> case, the SRAM chips will switch exactly at the same time, making the
> ground bounce problem even worse.

Sorry Attila, but you can't really apply design techniques from the TTL 
era to modern high speed designs with power planes, nowadays it's all a 
question about two things:

1) Inductance from planes to the chip die itself.
2) The planes's impedance from low to very high frequencies.

It doesn't really matter how many capacitors there are close to the chip 
if the pins are connected to low impedance planes. And you can safely 
share vias if needed, the reason a DDR DRAM chip have many power pins is 
to reduce the inductance from plane to chip, on a 

Re: [Soekris] net5501 inexplicable crashes under load and using wlan - mystery solved

2012-04-08 Thread Soren Kristensen
Hi Attila,

Attila Kinali wrote:
>
> Could be. I think that's unlikely. I rather think that the wlan card
> has a spiky power need, probably drawin ~300mA for very short periods.
> The VIA6105M should have about the same spikyness, but with less power
> consumption. As both share their power supply vias and capacitors, it's
> only a matter of time that both spike at the same time, moving one or
> both subsystems into a operating condition outside the specs.
>
>> I would also like to investigate the problem further. Can you please
>> tell me the exact wlan card ?
>
> I bought the following from soekris.eu:
>
> Complete wireless Bundle 9220-2
> Power Supply, 12V, 3.0A, IEC320-C8 inlet 90V-264V Worldwide
> 2.5" SATA hard drive mounting kit for the net5501
> net5501-70 Board and 1 Slot standard Case

According to their website then it's a Compex WLM200NX. Those are 
specified to draw up to 2.5W during TX.

Worst case peak should then be 0.75A, assuming the card have reasonable 
good high speed decoupling, which is way inside the net5501 design specs.

A quick and easy way to check power supply issues is to lower the TX 
power on the wlan card, have you tried that ?

> Please not that i had to use different antennas, as the net5501 has
> no space to drill holes for the TNC connectors. Why you sell them
> together is beyond me...

eeh, we don't sell them, Soekris EU, which are a different company, 
sells them.


Best Regards,


Soren Kristensen

CEO & Chief Engineer
Soekris Engineering, Inc.


___
Soekris-tech mailing list
Soekris-tech@lists.soekris.com
http://lists.soekris.com/mailman/listinfo/soekris-tech


Re: [Soekris] net5501 inexplicable crashes under load and using wlan - mystery solved

2012-04-08 Thread Bjarke Istrup Pedersen
2012/4/8 Soren Kristensen :
> Hi Attila,
>
> Attila Kinali wrote:
>> On Sun, 08 Apr 2012 01:34:58 +0200
>> Soren Kristensen  wrote:
>>
>>> Alvar Kusma wrote:

> As I have stated before, afaik the net5501 do not have any design
> issues, Attila's problem is most likely either software related,

 Please, can you explain, why similar board from PCEngines (Alix 2D13)
 with same software (OpenWRT image) just works, but Soekris board shows
 some unstability? Can you explain, why this same exact software works on
 one net5501 without a glitch over year now, but two other units show
 unstability signs - random hangs, sometimes works over month, sometimes
 crashes 2 times a day? This is still a mystery for me. Just bad luck?
>>>
>>> No, pretty simple:
>>>
>>> The Linux VT6105M driver has interrupt race problems, reported to have
>>> been fixed recently, don't know if it have ported to the main Linux sources.
>>>
>>> The Atheros wlan drivers seems to also have interrupt race problems,
>>> don't remember if that have been fixed too.
>>
>> You repeat this argument over and over. But apearently, you are the only
>> one who knows about these race conditions. I cannot find any reference
>> to the race condition on the VT6105M at all. And for the ath9k race,
>> the only one i could find was fixed october 2010 in the mainline kernel.
>> Can you provide us with references to what race conditions you mean and
>> where they are to be found?
>
>  From my post on 12/7/2011:
>
> Looking though the archieves I found two reported issues, both on Linux.
>
> 1) The thread in Sept/Oct 2010, concluding with Andrey Safonov reporting
> the Linux VIA VT6105M driver to have bug, and how to fix it:
>
> http://lists.soekris.com/pipermail/soekris-tech/2010-October/016884.html
> http://lists.soekris.com/pipermail/soekris-tech/2010-October/016889.html

Hey,

I tried submitting a patch containing those two lines upstream,
resulting in some work from Francois Romieu that fixes it the right
way. (See the mail I sent to this list on January 22. requesting help
testing, what nobody replied to).
Those patches where merged into mainline in linux 3.3-rc1, so version
3.3 and forward contains those fixes, which help fix the interrupt
crashes.

So for everyone using a kernel below version 3.3 and complaining about
crashes, they really should have read their mail and tested with
something newer - they had been notified :)

/Bjarke

> 2) And "green" reporting a fix to either ath9k, or all wireless drivers,
> in his post on Jan 25, 2011:
>
> http://lists.soekris.com/pipermail/soekris-tech/2011-January/017001.html
>
>>
>> But to kill that driver bug argument once and for all please explain the
>> following which i've seen during my test:
>>
>> Setup:
>> net5501 running debian/stable with a self build vanilla linux kernel
>> version 3.2.1. Connected to the net5501 are a notebook sata harddisk
>> and AR9200 wlan card. The LAN is connected on eth0.
>>
>> If the WLAN card is _not_ running (driver not loaded or disbaled by rfkill)
>> no problems can be seen. No crashes, nothing. For months.
>>
>> Test #1:
>> Setup as above. WLAN card enabled, traffic going trough both WLAN and eth0.
>> Result: System crashes in 2minutes (+/- 1 minute). No Oops, as would be
>> seen with most driver bugs on the serial console. It just hangs.
>>
>> Test #2:
>> Setup and test procedure as in Test #1, but with two 1000uF capacitors
>> connected to J5 at 5V and 3.3V power supplies.
>> Result: System crashes in 5minutes (-1min, +2min).
>>
>> Test #3:
>> Setup nd test procedure as in Test #2, but with three dozen ceramic 
>> capacitors
>> soldered on the board.
>> Result: No crash at all after one week. Even heavy system load doesn't
>> affect the system anymore.
>>
>>
>> Notes:
>> 1) Test #1 and Test #2 were repeated several dozen times. Although i have not
>> writen down the times it takes to crash the system and didnt do a
>> mathematically rigourus statistical analysis, i can state that the
>> difference between Test #1 and #2 is significant. Ie the additional 
>> capacitors
>> improve the situation considerably.
>>
>> 2) I run Test #1 and #2 before i did the modifications for Test #3 to ensure
>> the bug is still present and can be reproduced. I did not do any software
>> upgrades or any configuration changes in between. Ie if it would be a 
>> software
>> bug, it would be present in all three tests.
>>
>>
>> Soren, if you really have an explenation how a software bug (a race
>> condition as you say) can be fixed with a soldering iron, i really like
>> to hear that. I have systems that experience race conditions under
>> every once in a while and i'd like to fix those as well with my soldering
>> iron.
>
> Attila, thanks for the detailed testing done. I agreed with you that
> adding capacitors should not change behavior if it's a software problem
> alone.
>
> I will still state that the net5501 has the decoupling it needs for
> itself and the 

Re: [Soekris] net5501 inexplicable crashes under load and using wlan - mystery solved

2012-04-08 Thread Jan Ceuleers
On 08/04/12 13:22, Attila Kinali wrote:
> Could be. I think that's unlikely. I rather think that the wlan card
> has a spiky power need, probably drawin ~300mA for very short periods.
> The VIA6105M should have about the same spikyness, but with less power
> consumption. As both share their power supply vias and capacitors, it's
> only a matter of time that both spike at the same time, moving one or
> both subsystems into a operating condition outside the specs.

When I had stability problems with my net5501 I solved them by no longer 
using the first two ethernet ports. I now only use eth2 and eth3 and the 
box is rock solid.

http://lists.soekris.com/pipermail/soekris-tech/2010-December/016953.html

As you can see I reported this problem solved in December 2010 (by no 
longer using the first two Ethernet ports). I've not looked back since; 
I could test again if anyone's interested.

I have an Atheros (ath9k) wireless card:

00:11.0 Network controller: Atheros Communications Inc. AR922X Wireless 
Network Adapter (rev 01)
Subsystem: Atheros Communications Inc. Device 2096
Flags: bus master, 66MHz, medium devsel, latency 168, IRQ 15
Memory at a001 (32-bit, non-prefetchable) [size=64K]
Capabilities: [44] Power Management version 2
Kernel driver in use: ath9k
Kernel modules: ath9k

Currently using kernel 2.6.32-40-generic; I'll be upgrading to Ubuntu 
12.04 LTS once it's been out for a while.

Jan
___
Soekris-tech mailing list
Soekris-tech@lists.soekris.com
http://lists.soekris.com/mailman/listinfo/soekris-tech


Re: [Soekris] net5501 inexplicable crashes under load and using wlan - mystery solved

2012-04-08 Thread Aragon Gouveia
On 04/08/12 13:26, Alvar Kusma wrote:
>> I still have the problem that nobody running FreeBSD and OpenBSD have
>> reported similar issues, somebody correct me if I'm wrong.
>
> Just to point one:
>
> http://lists.soekris.com/pipermail/soekris-tech/2011-December/017988.html

You can add another to the list too - FreeBSD 8.x, no WLAN card, just 
the DP83816 4 port card.  Happy to provide more anecdotal evidence if 
requested.  Wish I could offer more.

I say it would be nice if Søren and Atilla could collaborate more 
closely.  Relying on watchdogs feels like such a dirty hack.  Cmon guys, 
this is science! :)


Regards,
Aragon
___
Soekris-tech mailing list
Soekris-tech@lists.soekris.com
http://lists.soekris.com/mailman/listinfo/soekris-tech


Re: [Soekris] net5501 inexplicable crashes under load and using wlan - mystery solved

2012-04-08 Thread Attila Kinali
On Wed, 4 Apr 2012 11:01:51 +0900
Alan  wrote:

> On Sun, Apr 1, 2012 at 7:05 PM, Attila Kinali  wrote:
> 
> > I finaly got the time to work on this again and got my net5501
> > working without crashes even under heavy load and using wlan at full
> > power. At least not within 24h.
> 
> Great that someone is working on this, but 24 hours is not much.  If I
> recall correctly one of my net5501 could go up to 2 weeks (of light
> use) without crashing.

Oh.. compared to 2 minutes (!) for an unmodified board or 5 minutes with
only two 1000uF electrolytic capacitors connected to J5 it's pretty impressive.
Beside, it was last weekend that i did the modifications, so i didn't had
the time to let it run for longer. Now i have it running for almost a week
and no crashes. 
 
> Like others have said, I am eagerly waiting for the description,
> schematics and pictures to fix this.

I didn't take any pictures of the modified board, but i can tell you
what the cause is in more details:


The power supply of the net5501 and how it is distributed to the circuitry
is disregarding all common good design practices. Hence leading to problems
in certain load and use conditions. These problems can be bit errors or
complete crashes.


Attached is a picture of an DDR SRAM chip mounted at the bottom of the
net5501. For your convenience, i marked the power supply pins red (for
VDD and VDDQ) and the ground pins blue (for the VSS and VSSQ).
I used this, because it's one of the best examples for what i want to
show and it has very little circuitry around that would distract
or make my point less clear.

The first thing that strikes the eye is, that there are only 4 capacitors
around the chip while it has 8 power supply pins. And even worse, those
4 capacitors are shared with the two adjacent SRAM chips. (effectively
halfing the number of capacitors "seen" by a chip)

The next thing you should notice is, that there isnt a via visible for
each power or ground pin. This suggests that only one via (underneath
the chip) has been used to connect the pin to it's power/ground plane.
You can also spot places where the same via is used for two pins.

Now, what does that mean? 
Digital chips are beasts in terms of power supply. Most of the time,
they do not draw any power (at least nothing you'd talk about), but
when the clock switches from high to low (or low to high, or both),
they draw a huge amount of power. One part of that power is used to
switch the transistors inside the chip, another part goes into the
switching of the output pins. Simplified, you can see the internal
circuitry and the output pins as an CMOS inverter [1]. When A changes
its logic level, there is first the gate capacitance that has to be
charged/discharged. Second, there is a very short period when both
transistors are conducting, leading to the so called shot trough
current. This current is limited by the current conductance properties
of the transistors themselves. For internal circuits it's quite low
(they dont have to conduct huge currents), but due to the number of
transistors switching at the same time, this cannot be neglected.
For the output pins it's a different matter. They are designed to
provide large currents (at least 16mA per pin in the DDR SRAM case).
So the shot trough current is significant for each pin and the situation
becomes worse when multiple pins are switching at the same time.
Please keep in mind, that the shot trough current lasts only for a
very short period of time, typically less than 1ns. On one hand, this
helps, as only little energy is lost by the shot trough. But on
the other hand, it leads to very high frequency components.

The next big power hog comes from the capacitance connected to the chip.
Each pin of a chip case has a capacitance in the order of 1-20pF.
(DDR SRAM chips have a pin capacitance of <5pF specified)
I.e. you have two pin capacitances (the "sender" and the "receiver" chip)
and the capacitance of the wire itself connected to the pin of the chip.
Each time an output pin switches high->low or low->high, this capacitance
has to be charged/discharged. Ie during this short period an current of
approximately of 16mA is flowing trough the pin. (Again: think about
multiple pins switching at the same time)

That's the theory.

Now to the practical stuff:

Because of the "spiky" current consumption of digital logic it has become
custom in the field of electronics to attach an 100nF capacitor to each
power supply pin, to ensure the power supply has a low inductance
and low resistance "power source" for the switching time. This has been
done since at least the 1970s, when the first 74xx logic family appeared.
You can see this still in DIL sockets sold with integrated 100nF capacitors.
The capacitor is connected directly between a power and a ground pin if
possible, to ensure minimal resistance between the capacitor and the
chip. You cannot group those capacitors together at one pin and just
connect the other pins to the power suppl

Re: [Soekris] net5501 inexplicable crashes under load and using wlan - mystery solved

2012-04-08 Thread Lev Serebryakov
Hello, Soren.
You wrote 8 апреля 2012 г., 14:59:21:


> I still have the problem that nobody running FreeBSD and OpenBSD have
> reported similar issues, somebody correct me if I'm wrong.
  It seems, that I have some interference between eth0 and WiFi in MiniPCI
 slot, but I've always attributed it to heat or/and damaged eth0 port.
 I've try to use it with MiniPCI card removed in near future (with it
 it could not get link from my provider's swithc on long cable, and
 et1 works without any problems. Also, eth0 works with my home switch
 with short 1m cable).

-- 
// Black Lion AKA Lev Serebryakov 

___
Soekris-tech mailing list
Soekris-tech@lists.soekris.com
http://lists.soekris.com/mailman/listinfo/soekris-tech


Re: [Soekris] net5501 inexplicable crashes under load and using wlan - mystery solved

2012-04-08 Thread Lev Serebryakov
Hello, Attila.
You wrote 8 апреля 2012 г., 14:10:22:

> Do you use eth0? If yes, try not using it. The VT6105M of eth0 shares
> its power supply pins and capacitors with the mini PCI interface.
> If your crashes are so rare, it might be enough that not using that
> interface is enough to get you into the regime that doesnt crash.
  It sounds interesting. I could not use WiFi in MiniPCI together with
 eth0 on my net5501 (eth0 could not establish link at all), but I was
 sure, that it is result of thunderbolt strike, as I've used eth0 for
 "upstream" provider connection (cable runs directly from provider's
 switch, which is located under the roof of my 10-store
 multi-apartment building, when my apartment is on 7th floor)...

-- 
// Black Lion AKA Lev Serebryakov 

___
Soekris-tech mailing list
Soekris-tech@lists.soekris.com
http://lists.soekris.com/mailman/listinfo/soekris-tech


Re: [Soekris] net5501 inexplicable crashes under load and using wlan - mystery solved

2012-04-08 Thread Alvar Kusma

> I still have the problem that nobody running FreeBSD and OpenBSD have 
> reported similar issues, somebody correct me if I'm wrong.

Just to point one:

http://lists.soekris.com/pipermail/soekris-tech/2011-December/017988.html


-- 
==
___
Soekris-tech mailing list
Soekris-tech@lists.soekris.com
http://lists.soekris.com/mailman/listinfo/soekris-tech


Re: [Soekris] net5501 inexplicable crashes under load and using wlan - mystery solved

2012-04-08 Thread Attila Kinali
On Sun, 08 Apr 2012 12:59:21 +0200
Soren Kristensen  wrote:

> Attila Kinali wrote:
> > You repeat this argument over and over. But apearently, you are the only
> > one who knows about these race conditions. I cannot find any reference
> > to the race condition on the VT6105M at all. And for the ath9k race,
> > the only one i could find was fixed october 2010 in the mainline kernel.
> > Can you provide us with references to what race conditions you mean and
> > where they are to be found?
> 
>  From my post on 12/7/2011:
> 
> Looking though the archieves I found two reported issues, both on Linux.
> 
> 1) The thread in Sept/Oct 2010, concluding with Andrey Safonov reporting 
> the Linux VIA VT6105M driver to have bug, and how to fix it:
> 
> http://lists.soekris.com/pipermail/soekris-tech/2010-October/016884.html
> http://lists.soekris.com/pipermail/soekris-tech/2010-October/016889.html

I cannot tell whether this has been fixed or not. The via-rhine driver
irq handling has seen a big overhaul since then. It could be that it's
fixed, it could be that it's not. Unfortunately, the reporterd did not
mention which kernel version he was using, hence i cannot even check
whether the code path is now properly locked or not, as i can only guess
what changes he did exactly (a proper patch would have been helpfull)

 
> 2) And "green" reporting a fix to either ath9k, or all wireless drivers, 
> in his post on Jan 25, 2011:
> 
> http://lists.soekris.com/pipermail/soekris-tech/2011-January/017001.html

Here again, no mention of the kernel version that has been used.
It could very well be that this is the race condition i found which has
been fixed in oct 2010. Please note that "compat-wireless" is the development
version of the wifi driver team, released for the benefit of those who
do not want to wait for the mainline kernel to pick up the changes they did.

As i said, my kernel has the version 3.2.1, which is has been released
on Jan 12, 2012. This is considerably newer than any of those reports.
I am sure that the ath9k bug has entered mailine kernel since then
(they sync up on ever second kernel release at least). And the other one
probably too if it has been properly reported to the kernel developers.


> > Soren, if you really have an explenation how a software bug (a race
> > condition as you say) can be fixed with a soldering iron, i really like
> > to hear that. I have systems that experience race conditions under
> > every once in a while and i'd like to fix those as well with my soldering
> > iron.
> 
> Attila, thanks for the detailed testing done. I agreed with you that 
> adding capacitors should not change behavior if it's a software problem 
> alone.

Finally you believe me
 
> I will still state that the net5501 has the decoupling it needs for 
> itself and the expansions it's designed for. One possible sources of 
> problem could be the power supply regulators as they located just behind 
> the mini-PCI slot, RF could be affecting t.ex. the compensation circuit, 
> so adding decoupling capacitors just fix the symptoms.

Could be. I think that's unlikely. I rather think that the wlan card
has a spiky power need, probably drawin ~300mA for very short periods.
The VIA6105M should have about the same spikyness, but with less power
consumption. As both share their power supply vias and capacitors, it's
only a matter of time that both spike at the same time, moving one or
both subsystems into a operating condition outside the specs.

> I would also like to investigate the problem further. Can you please 
> tell me the exact wlan card ?

I bought the following from soekris.eu:

Complete wireless Bundle 9220-2
Power Supply, 12V, 3.0A, IEC320-C8 inlet 90V-264V Worldwide
2.5" SATA hard drive mounting kit for the net5501
net5501-70 Board and 1 Slot standard Case

Please not that i had to use different antennas, as the net5501 has
no space to drill holes for the TNC connectors. Why you sell them
together is beyond me...

> And can you please ensure that the vt6105 driver is updated to a fixed 
> one, would really love data after that is done

As i said, i cannot. But i'm using a pretty recent kernel. If it
isn't fixed yet after 2 years...

> I still have the problem that nobody running FreeBSD and OpenBSD have 
> reported similar issues, somebody correct me if I'm wrong.

Have you considered that it might be because there are many more people
using linux than *bsd? Hence reports with linux are much more likely.

Attila Kinali

-- 
Why does it take years to find the answers to
the questions one should have asked long ago?
___
Soekris-tech mailing list
Soekris-tech@lists.soekris.com
http://lists.soekris.com/mailman/listinfo/soekris-tech


Re: [Soekris] net5501 inexplicable crashes under load and using wlan - mystery solved

2012-04-08 Thread Attila Kinali
On Sun, 8 Apr 2012 12:49:17 +0200
Flemming Jacobsen  wrote:

> Attila Kinali wrote:
> > If i had a software problem, why does the behaviour change if
> > i connect two 1000uF capacitors to the powersupply pins available
> > at J5? And how come the whole issue disapears after soldering three
> > dozen of capacitors at various places?
> 
> This screams "My PSU was sub par for the load I gave it". Or
> (less likely) "I overloaded the onboard power converter".

Unfortunately, even a 90W laptop power supply was sub par for the
load a single net5501 gave it...

And Soren himself said that the internal power supply is designed
for 3.5A on each rail. So that should be more than good enough...

 
> And adding 30+ capacitors to the board at random, speaks volumes
> about the level of EE knowledge this individual has.

Who said that i added the capacitors at random? You don't have
to believe me, but i know what i'm doing[1]. And i added the capacitors
there, where i thought they have the most impact on system/power supply
stability.

Attila Kinali


[1] As i've previously stated already, i design boards of the level of
the net5501 for a living. I know where the usual problems are and i know
how to get around them. 
-- 
Why does it take years to find the answers to
the questions one should have asked long ago?
___
Soekris-tech mailing list
Soekris-tech@lists.soekris.com
http://lists.soekris.com/mailman/listinfo/soekris-tech


Re: [Soekris] net5501 inexplicable crashes under load and using wlan - mystery solved

2012-04-08 Thread Soren Kristensen
Hi Attila,

Attila Kinali wrote:
> On Sun, 08 Apr 2012 01:34:58 +0200
> Soren Kristensen  wrote:
>
>> Alvar Kusma wrote:
>>>
 As I have stated before, afaik the net5501 do not have any design
 issues, Attila's problem is most likely either software related,
>>>
>>> Please, can you explain, why similar board from PCEngines (Alix 2D13)
>>> with same software (OpenWRT image) just works, but Soekris board shows
>>> some unstability? Can you explain, why this same exact software works on
>>> one net5501 without a glitch over year now, but two other units show
>>> unstability signs - random hangs, sometimes works over month, sometimes
>>> crashes 2 times a day? This is still a mystery for me. Just bad luck?
>>
>> No, pretty simple:
>>
>> The Linux VT6105M driver has interrupt race problems, reported to have
>> been fixed recently, don't know if it have ported to the main Linux sources.
>>
>> The Atheros wlan drivers seems to also have interrupt race problems,
>> don't remember if that have been fixed too.
>
> You repeat this argument over and over. But apearently, you are the only
> one who knows about these race conditions. I cannot find any reference
> to the race condition on the VT6105M at all. And for the ath9k race,
> the only one i could find was fixed october 2010 in the mainline kernel.
> Can you provide us with references to what race conditions you mean and
> where they are to be found?

 From my post on 12/7/2011:

Looking though the archieves I found two reported issues, both on Linux.

1) The thread in Sept/Oct 2010, concluding with Andrey Safonov reporting 
the Linux VIA VT6105M driver to have bug, and how to fix it:

http://lists.soekris.com/pipermail/soekris-tech/2010-October/016884.html
http://lists.soekris.com/pipermail/soekris-tech/2010-October/016889.html

2) And "green" reporting a fix to either ath9k, or all wireless drivers, 
in his post on Jan 25, 2011:

http://lists.soekris.com/pipermail/soekris-tech/2011-January/017001.html

>
> But to kill that driver bug argument once and for all please explain the
> following which i've seen during my test:
>
> Setup:
> net5501 running debian/stable with a self build vanilla linux kernel
> version 3.2.1. Connected to the net5501 are a notebook sata harddisk
> and AR9200 wlan card. The LAN is connected on eth0.
>
> If the WLAN card is _not_ running (driver not loaded or disbaled by rfkill)
> no problems can be seen. No crashes, nothing. For months.
>
> Test #1:
> Setup as above. WLAN card enabled, traffic going trough both WLAN and eth0.
> Result: System crashes in 2minutes (+/- 1 minute). No Oops, as would be
> seen with most driver bugs on the serial console. It just hangs.
>
> Test #2:
> Setup and test procedure as in Test #1, but with two 1000uF capacitors
> connected to J5 at 5V and 3.3V power supplies.
> Result: System crashes in 5minutes (-1min, +2min).
>
> Test #3:
> Setup nd test procedure as in Test #2, but with three dozen ceramic capacitors
> soldered on the board.
> Result: No crash at all after one week. Even heavy system load doesn't
> affect the system anymore.
>
>
> Notes:
> 1) Test #1 and Test #2 were repeated several dozen times. Although i have not
> writen down the times it takes to crash the system and didnt do a
> mathematically rigourus statistical analysis, i can state that the
> difference between Test #1 and #2 is significant. Ie the additional capacitors
> improve the situation considerably.
>
> 2) I run Test #1 and #2 before i did the modifications for Test #3 to ensure
> the bug is still present and can be reproduced. I did not do any software
> upgrades or any configuration changes in between. Ie if it would be a software
> bug, it would be present in all three tests.
>
>
> Soren, if you really have an explenation how a software bug (a race
> condition as you say) can be fixed with a soldering iron, i really like
> to hear that. I have systems that experience race conditions under
> every once in a while and i'd like to fix those as well with my soldering
> iron.

Attila, thanks for the detailed testing done. I agreed with you that 
adding capacitors should not change behavior if it's a software problem 
alone.

I will still state that the net5501 has the decoupling it needs for 
itself and the expansions it's designed for. One possible sources of 
problem could be the power supply regulators as they located just behind 
the mini-PCI slot, RF could be affecting t.ex. the compensation circuit, 
so adding decoupling capacitors just fix the symptoms.

I would also like to investigate the problem further. Can you please 
tell me the exact wlan card ?

And can you please ensure that the vt6105 driver is updated to a fixed 
one, would really love data after that is done

I still have the problem that nobody running FreeBSD and OpenBSD have 
reported similar issues, somebody correct me if I'm wrong.


Best Regards,


Soren Kristensen

CEO & Chief Engineer
Soekris Engineering, Inc.
__

Re: [Soekris] net5501 inexplicable crashes under load and using wlan - mystery solved

2012-04-08 Thread Alvar Kusma
> Do you use eth0? If yes, try not using it. The VT6105M of eth0 shares
> its power supply pins and capacitors with the mini PCI interface.

This is the reason i obtained net5501-s - need for more than 3 ethernet 
NIC-s on router, _real_ NIC-s, and not those vlan capable 5 port 
internal "smart switches", found on almost every 70€ SOHO routers. No 
need for wlan, no need for miniPCI slot... just plain multiethernet router.

-- 
==
___
Soekris-tech mailing list
Soekris-tech@lists.soekris.com
http://lists.soekris.com/mailman/listinfo/soekris-tech


Re: [Soekris] net5501 inexplicable crashes under load and using wlan - mystery solved

2012-04-08 Thread Flemming Jacobsen
Attila Kinali wrote:
> If i had a software problem, why does the behaviour change if
> i connect two 1000uF capacitors to the powersupply pins available
> at J5? And how come the whole issue disapears after soldering three
> dozen of capacitors at various places?

This screams "My PSU was sub par for the load I gave it". Or
(less likely) "I overloaded the onboard power converter".

And adding 30+ capacitors to the board at random, speaks volumes
about the level of EE knowledge this individual has.


Regards,
Flemming

-- 
Flemming Jacobsen  Email: f...@batmule.dk

"People are more violently opposed to fur than leather because it's safer
to harass rich women than motorcycle gangs."  -- Unknown
___
Soekris-tech mailing list
Soekris-tech@lists.soekris.com
http://lists.soekris.com/mailman/listinfo/soekris-tech


Re: [Soekris] net5501 inexplicable crashes under load and using wlan - mystery solved

2012-04-08 Thread Attila Kinali
On Sat, 07 Apr 2012 23:03:43 +0300
Alvar Kusma  wrote:

> > As I have stated before, afaik the net5501 do not have any design 
> > issues, Attila's problem is most likely either software related, 
> 
> Please, can you explain, why similar board from PCEngines (Alix 2D13) 
> with same software (OpenWRT image) just works, but Soekris board shows 
> some unstability? Can you explain, why this same exact software works on 
> one net5501 without a glitch over year now, but two other units show 
> unstability signs - random hangs, sometimes works over month, sometimes 
> crashes 2 times a day? This is still a mystery for me. Just bad luck?

Do you use eth0? If yes, try not using it. The VT6105M of eth0 shares
its power supply pins and capacitors with the mini PCI interface.
If your crashes are so rare, it might be enough that not using that
interface is enough to get you into the regime that doesnt crash.

Attila Kinali
-- 
Why does it take years to find the answers to
the questions one should have asked long ago?
___
Soekris-tech mailing list
Soekris-tech@lists.soekris.com
http://lists.soekris.com/mailman/listinfo/soekris-tech


Re: [Soekris] net5501 inexplicable crashes under load and using wlan - mystery solved

2012-04-08 Thread Attila Kinali
On Sun, 08 Apr 2012 01:34:58 +0200
Soren Kristensen  wrote:

> Alvar Kusma wrote:
> >
> >> As I have stated before, afaik the net5501 do not have any design
> >> issues, Attila's problem is most likely either software related,
> >
> > Please, can you explain, why similar board from PCEngines (Alix 2D13)
> > with same software (OpenWRT image) just works, but Soekris board shows
> > some unstability? Can you explain, why this same exact software works on
> > one net5501 without a glitch over year now, but two other units show
> > unstability signs - random hangs, sometimes works over month, sometimes
> > crashes 2 times a day? This is still a mystery for me. Just bad luck?
> 
> No, pretty simple:
> 
> The Linux VT6105M driver has interrupt race problems, reported to have 
> been fixed recently, don't know if it have ported to the main Linux sources.
> 
> The Atheros wlan drivers seems to also have interrupt race problems, 
> don't remember if that have been fixed too.

You repeat this argument over and over. But apearently, you are the only
one who knows about these race conditions. I cannot find any reference
to the race condition on the VT6105M at all. And for the ath9k race,
the only one i could find was fixed october 2010 in the mainline kernel.
Can you provide us with references to what race conditions you mean and
where they are to be found?


But to kill that driver bug argument once and for all please explain the
following which i've seen during my test:

Setup:
net5501 running debian/stable with a self build vanilla linux kernel
version 3.2.1. Connected to the net5501 are a notebook sata harddisk
and AR9200 wlan card. The LAN is connected on eth0.

If the WLAN card is _not_ running (driver not loaded or disbaled by rfkill)
no problems can be seen. No crashes, nothing. For months.

Test #1:
Setup as above. WLAN card enabled, traffic going trough both WLAN and eth0.
Result: System crashes in 2minutes (+/- 1 minute). No Oops, as would be
seen with most driver bugs on the serial console. It just hangs.

Test #2:
Setup and test procedure as in Test #1, but with two 1000uF capacitors
connected to J5 at 5V and 3.3V power supplies.
Result: System crashes in 5minutes (-1min, +2min).

Test #3:
Setup nd test procedure as in Test #2, but with three dozen ceramic capacitors
soldered on the board.
Result: No crash at all after one week. Even heavy system load doesn't
affect the system anymore.


Notes:
1) Test #1 and Test #2 were repeated several dozen times. Although i have not
writen down the times it takes to crash the system and didnt do a 
mathematically rigourus statistical analysis, i can state that the
difference between Test #1 and #2 is significant. Ie the additional capacitors
improve the situation considerably.

2) I run Test #1 and #2 before i did the modifications for Test #3 to ensure
the bug is still present and can be reproduced. I did not do any software
upgrades or any configuration changes in between. Ie if it would be a software
bug, it would be present in all three tests.


Soren, if you really have an explenation how a software bug (a race
condition as you say) can be fixed with a soldering iron, i really like
to hear that. I have systems that experience race conditions under
every once in a while and i'd like to fix those as well with my soldering
iron.


Attila Kinali

-- 
Why does it take years to find the answers to
the questions one should have asked long ago?
___
Soekris-tech mailing list
Soekris-tech@lists.soekris.com
http://lists.soekris.com/mailman/listinfo/soekris-tech


Re: [Soekris] net5501 inexplicable crashes under load and using wlan - mystery solved

2012-04-08 Thread Alvar Kusma
> The net5501 have 4 ethernet ports, 1 Mini-PCI slot and 2 PCI expansion 
> slots each with 4 interrupt lines. Therefore the 4 interrupt pins on the 
> Geode LX need to be shared along the possible devices. PCI Interrupt 
> sharing is part of the PCI specification and should in principle not be 
> an issue with correct drivers.

After looking 2D13 schematics i can confirm, that all 3 VT6105M NIC-s 
are connected to different INT# pins on CS5536 (INTB#, INTC#, INTD#). 
MiniPCI slot routed to INTA# and INTB# (shared with eth0). If you count 
J4 as one net5501 PCI expansion slot, then this is practically unusable 
for everyone. Is correct to assume, that when unused, then MiniPCI slot 
doesn't take interrupt line either? My exact case - net5501, all four 
interfaces in use + Intel EEPro/100 card on PCI slot and that's all.

> can vary depending on exact version and usage patterns

Yes, i must confess, that this can cover almost everything...

-- 
==
___
Soekris-tech mailing list
Soekris-tech@lists.soekris.com
http://lists.soekris.com/mailman/listinfo/soekris-tech


Re: [Soekris] net5501 inexplicable crashes under load and using wlan - mystery solved

2012-04-08 Thread Attila Kinali
On Sat, 07 Apr 2012 20:32:41 +0200
Soren Kristensen  wrote:

> Are not really interested in commenting on Attila, especially as you 
> seems to have received private emails off list (or the one with photo is 
> stuck in our spam filter).

It is not private and it is not stuck in your spamfilter.
I send it to the mailinglist, but apearantly a 30kB picture and
a long description what it shows is to big and thus it is stuck in
the moderation queue. As the moderator of the mailinglist has not rejected
the mail yet, i havent bothered to send it again without the picture.
 
> But of course the net5501 have power and ground planes, that should be 
> obvious to any engineer, it's actually a 6 layer board. And the memory 
> have more than enough decoupling capacitors of different sizes, just go 
> to newegg.com and look at some pictures showing how few capacitors 
> module manufacturers believe are needed

To the contrary, it does not have enough capacitors.

> As I have stated before, afaik the net5501 do not have any design 
> issues, Attila's problem is most likely either software related, wlan 
> card related, or he might have a defect board (rare, but do of course 
> happens), which I have offered to replace before. But of course not if 
> he have ruined it with his experiments

If i had a software problem, why does the behaviour change if
i connect two 1000uF capacitors to the powersupply pins available
at J5? And how come the whole issue disapears after soldering three
dozen of capacitors at various places?

Attila Kinali

-- 
Why does it take years to find the answers to
the questions one should have asked long ago?
___
Soekris-tech mailing list
Soekris-tech@lists.soekris.com
http://lists.soekris.com/mailman/listinfo/soekris-tech


Re: [Soekris] net5501 inexplicable crashes under load and using wlan - mystery solved

2012-04-08 Thread Alvar Kusma
> The Linux VT6105M driver has interrupt race problems, reported to have

2D13 has same ethernet chip. Just FYI - i have 10+ Alix boards, all 
without this sort of problems. Just used net5501 boards simply because i 
need 4 or 5 LAN interfaces, Alix is limited to 3 (i don't count usb 
NIC-s as real alternative).

> Afaik, those reporting problems with crashing are all running Linux with 
> Atheros wlan.

Don't have wlan cards at all, except one (ath5k based, with 2D13, in my 
home). Just preferred AP with dedicated device (WRT54GL or similar), so 
this is not my case.

Anyway, interrupt line explanation makes some sense, but this doesn't 
explain one of my problem. I tried to swap net5501 units on two 
different places (CF cards with software swapped between net5501-s) and 
this unstability symptom "walks" together with unit. Tried different 
power supplies too. Not a first time mentioned this.

-- 
==
___
Soekris-tech mailing list
Soekris-tech@lists.soekris.com
http://lists.soekris.com/mailman/listinfo/soekris-tech


Re: [Soekris] net5501 inexplicable crashes under load and using wlan - mystery solved

2012-04-07 Thread Alan
On Sun, Apr 8, 2012 at 8:34 AM, Soren Kristensen  wrote:
> Hi Alvar,
>
> Alvar Kusma wrote:
>>
>>> As I have stated before, afaik the net5501 do not have any design
>>> issues, Attila's problem is most likely either software related,
>>
>> Please, can you explain, why similar board from PCEngines (Alix 2D13)
>> with same software (OpenWRT image) just works, but Soekris board shows
>> some unstability? Can you explain, why this same exact software works on
>> one net5501 without a glitch over year now, but two other units show
>> unstability signs - random hangs, sometimes works over month, sometimes
>> crashes 2 times a day? This is still a mystery for me. Just bad luck?
>
> No, pretty simple:
>
> The Linux VT6105M driver has interrupt race problems, reported to have
> been fixed recently, don't know if it have ported to the main Linux sources.
>
> The Atheros wlan drivers seems to also have interrupt race problems,
> don't remember if that have been fixed too.
>
> The Alix 3D13 have few expansion options, just 3 ethernet ports and 1
> Mini-PCI slot, and therefore don't need to share the 4 possible
> interrupt pins on the Geode LX.
>
> The net5501 have 4 ethernet ports, 1 Mini-PCI slot and 2 PCI expansion
> slots each with 4 interrupt lines. Therefore the 4 interrupt pins on the
> Geode LX need to be shared along the possible devices. PCI Interrupt
> sharing is part of the PCI specification and should in principle not be
> an issue with correct drivers.
>
> But can easily be an issue on less than perfect drivers with interrupt
> race problems and not much testing on slow processors, and can vary
> depending on exact version and usage patterns
>
> Afaik, those reporting problems with crashing are all running Linux with
> Atheros wlan.

Not saying that you are wrong, but FYI It also happens to me when
using a Broadcom wlan card in Linux.
___
Soekris-tech mailing list
Soekris-tech@lists.soekris.com
http://lists.soekris.com/mailman/listinfo/soekris-tech


Re: [Soekris] net5501 inexplicable crashes under load and using wlan - mystery solved

2012-04-07 Thread Soren Kristensen
Hi Alvar,

Alvar Kusma wrote:
>
>> As I have stated before, afaik the net5501 do not have any design
>> issues, Attila's problem is most likely either software related,
>
> Please, can you explain, why similar board from PCEngines (Alix 2D13)
> with same software (OpenWRT image) just works, but Soekris board shows
> some unstability? Can you explain, why this same exact software works on
> one net5501 without a glitch over year now, but two other units show
> unstability signs - random hangs, sometimes works over month, sometimes
> crashes 2 times a day? This is still a mystery for me. Just bad luck?

No, pretty simple:

The Linux VT6105M driver has interrupt race problems, reported to have 
been fixed recently, don't know if it have ported to the main Linux sources.

The Atheros wlan drivers seems to also have interrupt race problems, 
don't remember if that have been fixed too.

The Alix 3D13 have few expansion options, just 3 ethernet ports and 1 
Mini-PCI slot, and therefore don't need to share the 4 possible 
interrupt pins on the Geode LX.

The net5501 have 4 ethernet ports, 1 Mini-PCI slot and 2 PCI expansion 
slots each with 4 interrupt lines. Therefore the 4 interrupt pins on the 
Geode LX need to be shared along the possible devices. PCI Interrupt 
sharing is part of the PCI specification and should in principle not be 
an issue with correct drivers.

But can easily be an issue on less than perfect drivers with interrupt 
race problems and not much testing on slow processors, and can vary 
depending on exact version and usage patterns

Afaik, those reporting problems with crashing are all running Linux with 
Atheros wlan.


Best Regards,


Soren Kristensen

CEO & Chief Engineer
Soekris Engineering, Inc.

___
Soekris-tech mailing list
Soekris-tech@lists.soekris.com
http://lists.soekris.com/mailman/listinfo/soekris-tech


Re: [Soekris] net5501 inexplicable crashes under load and using wlan - mystery solved

2012-04-07 Thread Alvar Kusma

> As I have stated before, afaik the net5501 do not have any design 
> issues, Attila's problem is most likely either software related, 

Please, can you explain, why similar board from PCEngines (Alix 2D13) 
with same software (OpenWRT image) just works, but Soekris board shows 
some unstability? Can you explain, why this same exact software works on 
one net5501 without a glitch over year now, but two other units show 
unstability signs - random hangs, sometimes works over month, sometimes 
crashes 2 times a day? This is still a mystery for me. Just bad luck?

Btw,

 > Are not really interested in commenting on Attila

What good gives such statement? :-|

-- 
==
___
Soekris-tech mailing list
Soekris-tech@lists.soekris.com
http://lists.soekris.com/mailman/listinfo/soekris-tech


Re: [Soekris] net5501 inexplicable crashes under load and using wlan - mystery solved

2012-04-07 Thread Soren Kristensen
Hi Jeff,

JSL Internet wrote:
> Attila,
>
>Thank you for the detailed post.  It is clear that you have some
> understanding of best practices regarding digital PCB layout.  However,
> you've made a number of assumptions and generalizations.  Are you
> certain that there are no power and ground layers within the net5501
> multi-layer board?  I would be surprised if this were true, but I'll
> leave it up to Soren to clarify.  The photo you posted shows four bypass
> capacitors surrounding the RAM chip in very close proximity.  It would
> be difficult for any layout designer to get them any closer to the
> chip.  Sharing capacitors is just fine as long as there are separate
> runs from the capacitor to each chip.  The whole point of the capacitors
> is to offset the inductance of the PCB power runs.  It is physically
> impossible to have zero-length runs from the capacitors to the chips.
> The chips have small internal capacitors anyway.  Your photo
> demonstrates a good PCB design.

Are not really interested in commenting on Attila, especially as you 
seems to have received private emails off list (or the one with photo is 
stuck in our spam filter).

But of course the net5501 have power and ground planes, that should be 
obvious to any engineer, it's actually a 6 layer board. And the memory 
have more than enough decoupling capacitors of different sizes, just go 
to newegg.com and look at some pictures showing how few capacitors 
module manufacturers believe are needed

>If I were having problems with a small SBC (net5501) that only
> occurred when it was attached to an RF transmitter (your WLAN card), I
> would be looking at the RF susceptibility of the SBC, and the isolation
> of the transmission line and transmitting elements from the SBC.  It's
> doubtful that Soren did any RF susceptibility testing or analysis of the
> computer.  Most computer manufacturers do not bother with such things.
> The metal boxes he sells for the SBCs should take care of most of the
> problems anyway.  It's entirely likely that the problems that you've
> "fixed" are related to the RF susceptibility of the net5501.  Small
> pieces of carefully applied brass or mu-metal in selected circuit areas
> would probably have accomplished the same thing that you did by adding
> some capacitors.

As I have stated before, afaik the net5501 do not have any design 
issues, Attila's problem is most likely either software related, wlan 
card related, or he might have a defect board (rare, but do of course 
happens), which I have offered to replace before. But of course not if 
he have ruined it with his experiments


Best Regards,


Soren Kristensen

CEO & Chief Engineer
Soekris Engineering, Inc.


>Jeff
>
> On 04/06/2012 07:03 AM, Attila Kinali wrote:
>> 
>> The power supply of the net5501 and how it is distributed to the circuitry
>> is disregarding all common good design practices. Hence leading to problems
>> in certain load and use conditions. These problems can be bit errors or
>> complete crashes.
>> 
> ___
> Soekris-tech mailing list
> Soekris-tech@lists.soekris.com
> http://lists.soekris.com/mailman/listinfo/soekris-tech
___
Soekris-tech mailing list
Soekris-tech@lists.soekris.com
http://lists.soekris.com/mailman/listinfo/soekris-tech


Re: [Soekris] net5501 inexplicable crashes under load and using wlan - mystery solved

2012-04-07 Thread JSL Internet
Attila,

  Thank you for the detailed post.  It is clear that you have some 
understanding of best practices regarding digital PCB layout.  However, 
you've made a number of assumptions and generalizations.  Are you 
certain that there are no power and ground layers within the net5501 
multi-layer board?  I would be surprised if this were true, but I'll 
leave it up to Soren to clarify.  The photo you posted shows four bypass 
capacitors surrounding the RAM chip in very close proximity.  It would 
be difficult for any layout designer to get them any closer to the 
chip.  Sharing capacitors is just fine as long as there are separate 
runs from the capacitor to each chip.  The whole point of the capacitors 
is to offset the inductance of the PCB power runs.  It is physically 
impossible to have zero-length runs from the capacitors to the chips.  
The chips have small internal capacitors anyway.  Your photo 
demonstrates a good PCB design.

  If I were having problems with a small SBC (net5501) that only 
occurred when it was attached to an RF transmitter (your WLAN card), I 
would be looking at the RF susceptibility of the SBC, and the isolation 
of the transmission line and transmitting elements from the SBC.  It's 
doubtful that Soren did any RF susceptibility testing or analysis of the 
computer.  Most computer manufacturers do not bother with such things.  
The metal boxes he sells for the SBCs should take care of most of the 
problems anyway.  It's entirely likely that the problems that you've 
"fixed" are related to the RF susceptibility of the net5501.  Small 
pieces of carefully applied brass or mu-metal in selected circuit areas 
would probably have accomplished the same thing that you did by adding 
some capacitors.

  Jeff

On 04/06/2012 07:03 AM, Attila Kinali wrote:
> 
> The power supply of the net5501 and how it is distributed to the circuitry
> is disregarding all common good design practices. Hence leading to problems
> in certain load and use conditions. These problems can be bit errors or
> complete crashes.
> 
___
Soekris-tech mailing list
Soekris-tech@lists.soekris.com
http://lists.soekris.com/mailman/listinfo/soekris-tech


Re: [Soekris] net5501 inexplicable crashes under load and using wlan - mystery solved

2012-04-04 Thread Nix
On 4 Apr 2012, Alan told this:

> On Sun, Apr 1, 2012 at 7:05 PM, Attila Kinali  wrote:
>
>> I finaly got the time to work on this again and got my net5501
>> working without crashes even under heavy load and using wlan at full
>> power. At least not within 24h.
>
> Great that someone is working on this, but 24 hours is not much.  If I
> recall correctly one of my net5501 could go up to 2 weeks (of light
> use) without crashing.

That's not very impressive! Mine is normally up for >70 days (depending
on kernel upgrades), and essentially never crashes for non-software
reasons. It's not heavily loaded most of the time, but it sees large
load spikes sometimes and is doing a lot of network, uh, work.

If you're only seeing two weeks between crashes, something is still
wrong!

> Like others have said, I am eagerly waiting for the description,
> schematics and pictures to fix this.

And I'm wondering why it hasn't affected me. Perhaps because no wlan is
involved, so the power draw is lower?

-- 
NULL && (void)
___
Soekris-tech mailing list
Soekris-tech@lists.soekris.com
http://lists.soekris.com/mailman/listinfo/soekris-tech


Re: [Soekris] net5501 inexplicable crashes under load and using wlan - mystery solved

2012-04-03 Thread Alan
On Sun, Apr 1, 2012 at 7:05 PM, Attila Kinali  wrote:

> I finaly got the time to work on this again and got my net5501
> working without crashes even under heavy load and using wlan at full
> power. At least not within 24h.

Great that someone is working on this, but 24 hours is not much.  If I
recall correctly one of my net5501 could go up to 2 weeks (of light
use) without crashing.

Like others have said, I am eagerly waiting for the description,
schematics and pictures to fix this.
___
Soekris-tech mailing list
Soekris-tech@lists.soekris.com
http://lists.soekris.com/mailman/listinfo/soekris-tech


Re: [Soekris] net5501 inexplicable crashes under load and using wlan - mystery solved (Attila Kinali)

2012-04-02 Thread Bob Gustafson
On Tue, 2012-04-03 at 01:27 +0200, Attila Kinali wrote:
> On Mon, 02 Apr 2012 09:54:16 -0700
> JSL Internet  wrote:
> 
> >   You claim to have solved a mystery but you did not provide any 
> > technical details of the problem or the solution.  This is a support 
> > list but instead of being supportive or asking for support, you are 
> > being critical of a product that most users do not have any trouble 
> > with.  Without knowing what you did to the net5501 to "fix" it, nobody 
> > benefits and your credibility suffers.  I'm an experienced RF/digital 
> > engineer and I've seen some "creative" explanations and solutions here 
> > on the Soekris list.
> 
> My credibility? I've never been called an idiot as often as on
> this mailinglist, do you really think i have a credibility to lose?
> 
> If you are an RF/digital engineer, just take a random net5501 board
> or a picture of it. Follow the power supply lines and you will see
> why some people have crashes and some do not. It is really that obvious!
> 
> How i fixed it is simple: take a hand full of capacitors, sprinkle
> them over the board and solder them where they fall. Depending on
> the load conditons you will also need to add a couple of wires.
> 
>   Attila Kinai

I would like to see some pictures (.jpg) of your modified board. Share
your expertise with the rest of the list.

Bob G


___
Soekris-tech mailing list
Soekris-tech@lists.soekris.com
http://lists.soekris.com/mailman/listinfo/soekris-tech


Re: [Soekris] net5501 inexplicable crashes under load and using wlan - mystery solved (Attila Kinali)

2012-04-02 Thread Attila Kinali
On Mon, 02 Apr 2012 09:54:16 -0700
JSL Internet  wrote:

>   You claim to have solved a mystery but you did not provide any 
> technical details of the problem or the solution.  This is a support 
> list but instead of being supportive or asking for support, you are 
> being critical of a product that most users do not have any trouble 
> with.  Without knowing what you did to the net5501 to "fix" it, nobody 
> benefits and your credibility suffers.  I'm an experienced RF/digital 
> engineer and I've seen some "creative" explanations and solutions here 
> on the Soekris list.

My credibility? I've never been called an idiot as often as on
this mailinglist, do you really think i have a credibility to lose?

If you are an RF/digital engineer, just take a random net5501 board
or a picture of it. Follow the power supply lines and you will see
why some people have crashes and some do not. It is really that obvious!

How i fixed it is simple: take a hand full of capacitors, sprinkle
them over the board and solder them where they fall. Depending on
the load conditons you will also need to add a couple of wires.

Attila Kinai
-- 
Why does it take years to find the answers to
the questions one should have asked long ago?
___
Soekris-tech mailing list
Soekris-tech@lists.soekris.com
http://lists.soekris.com/mailman/listinfo/soekris-tech


Re: [Soekris] net5501 inexplicable crashes under load and using wlan - mystery solved (Attila Kinali)

2012-04-02 Thread JSL Internet
Attila,

  You claim to have solved a mystery but you did not provide any 
technical details of the problem or the solution.  This is a support 
list but instead of being supportive or asking for support, you are 
being critical of a product that most users do not have any trouble 
with.  Without knowing what you did to the net5501 to "fix" it, nobody 
benefits and your credibility suffers.  I'm an experienced RF/digital 
engineer and I've seen some "creative" explanations and solutions here 
on the Soekris list.

  Did you try moving your WLAN card external antenna away from the 
net5501?  Is the coaxial cable or the WLAN card itself defective?  Does 
the combined load of the WLAN card and the net5501 exceed the current 
limit for the on-board switching power supply?  Do you have a defective 
net5501? You may have soldered some SM capacitors onto your board and 
"fixed" a problem that is being caused by your WLAN card or your 
configuration.

  Jeff Laing
  soek...@jsl.com
___
Soekris-tech mailing list
Soekris-tech@lists.soekris.com
http://lists.soekris.com/mailman/listinfo/soekris-tech


Re: [Soekris] net5501 inexplicable crashes under load and using wlan - mystery solved

2012-04-02 Thread Jan Ceuleers
On 01/04/12 12:05, Attila Kinali wrote:
> Can it be fixed? Not really. It's a design issue. The only way to
> really fix it is to change the design and redo the boards.
> If you already own a net5501, the only thing you can do is, to patch
> it up until the crash probability reaches a level where you dont care
> anymore. And for this you need a soldering iron and some skill using it,
> as it means modifying a small pitch SMD PCB. Depending on how you use
> the net5501 this means anything from an hour of soldering to a full day
> or two of rework.

Care to share?
___
Soekris-tech mailing list
Soekris-tech@lists.soekris.com
http://lists.soekris.com/mailman/listinfo/soekris-tech