we have a 50' tower that had a soekris4511 board running a modified version of pebble linux.  The system worked great for nearly 2 years.  We upgraded the system to a soekris 4521 and bridged both pcmcia interfaces to have a 2 sector site.  The 2 sector system works great except for one problem:  it randomly dies every 1-4 days and never comes back!  (until a tech goes on site and recycles power)

The lockup symptoms are as folows:
1) blinking link light at switch where eth0 is plugged in.
2) No response from any interface - wired or wireless.
3) System log is set to issue a "mark" line every 10 minutes, but nothing is written during this lockup time.

The system has a working & tested watchdog timer.

What has been tried (not in this order):

1) cron job that pings wireless backhaul and does a reboot if no ping answer for 10 min. (didn't ever run)

2) Thinking it might be a power problem we replaced power supplies.

3) Not trusting our POE ethernet cable, we used a second Cat5 cable for DC power only.  4 wires were used for each line of the DC power, which was plugged directly into the motherboard.

4) Added a ground rod & cable to improve tower grounding. (remember though, this single sector system worked fine without this added grounding)

5) swapped out the 4521 motherboard.

6) created a bench test system.  This was an exact duplicate of the tower system without external antennas, run on the bench.

wireless LT -> 2 sector system(backhaul link) - > wireless router -> wired laptop

In this test system our test AP runs without any wired connections, as it is in the field.  We ran flat out repeated copy scripts for 3-4 days, and transferred approx 40G at about 3Mb/s (way more that actual field conditions!).

Never saw test system lockup, its up time was always correct.  This actual 4521 mother board is now on the tower, and we still see the problem.

Any suggestions??

Thank you kindly,
