> "The net6501 seems to have this sort of problem, and it’s fatal. At some point your box will likely > stop booting entirely. [...] The root cause of death is not well understood, at least not by anyone > outside of Soekris. There does not seem to be a cure. The 5501 by comparison seems almost > impossible to kill."
I've some speculation on what is happening behind the scenes here; it could be dead-on - or completely wacko. I don't expect anyone to wade through this whole posting, as it's way too long (though there is a two-sentence TL;DR version at the bottom). Putting it out there, though, just in case a future someone is piecing together the puzzle together and finds it helpful. ---- So: Let's take what we either know as fact, or can pretty reliably believe to be true, and speculate from there: 1) Soren K / Soekris clearly has the ability to design and manufacture stable, reliable systems - witness every Soekris product before the net6501. 2) Yet...the net6501 has a maddening, common failure mode, one so severe the board is totally bricked, without recourse, even if it was running perfectly minutes beforehand. 3) No one understands the root causes of the failure, and Soekris has been extraordinarily close-lipped about it, in these forums and elsewhere. No explanations whatsoever. Total silence even as reports of the dead boards mount mount and their reputation looks worse and worse. Doesn't make sense. 4) Data point A: If you send a dead board back to Soekris _in_ the warranty period, it will be quickly replaced without question. Data point B: If you send a dead board back to Soekris _out_ of warranty, for diagnosis and repair, it will simply be returned as "can't be fixed". Data point C: When you inquire _why_ it can't be fixed - after all, aren't components, even board-level ones, ultimately replaceable? - you won't get any answer at all. They'll just tell you... "we're really sorry, wish we could, we can't.". End of story. 5) Last year, Soekris suddenly and with very little explanation cancelled development of a new Intel-Atom-based board - what would have been the net6801. 6) Intel had/has a severe problem with clock chips in other embedded Atom CPU products (but supposedly, not the Atom E6xx the net6501 uses - see: https://www.theregister.co.uk/2017/02/06/cisco_intel_de cline_to_link_product_warning_to_faulty_chip/ ) This clock chip failure is so severe that it can kill the system it's being used in completely. SO....I'm going to go out on a limb here and suggest one speculative scenario that pulls together all of the above and explains the net6501 and Soekris's actions around it. Note carefully that I have no association with Soekris nor Intel whatsoever, except as a customer of both; this speculation is constructed with absolutely zero "inside" information. A) The net6501 brickings we see are due to a similar - or identical - Intel clock chip degradation issue as the C2000 embedded Atom chip had. The failure is extreme and can't be patched around by simple mods to hardware or firmware; it's 'baked in' to the product. B) Soekris knows this. Intel privately admits this to them. C) Intel, wanting to avoid the bad publicity that would result from their component's failure when used in a supposedly "super reliable" machine built entirely around it, makes a simple deal with Soekris: we'll pay the costs of ALL warranty replacements for your boards that die due to our chip biting the dust. And further, pay for you to scrap your current inventory with the bad part. D) Soekris accepts, and pays nothing for Intel's failure. They get what amounts to a huge settlement for such a small company, without spending big on lawyer fees or waiting for years to get the dough. Their future liability for dead net6501 is now zero. E) In return, Intel pays out what (for them) amounts to change found their couch cushions - it's a trivial amount. More importantly, since it's only agreeing to reimburse Soekris for in-warranty items, its financial exposure is strictly limited - it ends exactly three years from when Soekris sold the last bad-Intel-part board. (Public corporation accountants *love* closed-end liabilities.) Everyone's happy... F) ....except for: the Faustian bargain Soekris had to make with Intel's horde of lawyers to have the above happen. As a condition of accepting the above pile o' cash and shielding themselves from crippling warranty returns payouts, Soekris signs in blood that they will to maintain 1001% silence on the issue and the nature - or even existence! - of their agreement with Intel. They're allowed to make NO mention of *why* their boards are dying prematurely. Never, ever. Not even a HINT. They have to keep clam forever - or else the money spigot dries up and/or they get sued into oblivion. G) So...Soekris quietly replaces the net6501s that Intel gives them hard cash to take care of, and regretfully ignores the rest. Their bank account remains intact - but at the cost of their formerly stellar reputation being in tatters, and nothing they can do about it. H) The only way Soekris can even remotely complain and signal their displeasure is to publicly, but without any convincing explanation, cancel their future Intel-based product plans completely (the net6801). Yeah, there are some holes in the above, and definitely some alternative explanations. But Soekris's behavior is so bizarre and irrational that I'm left guessing that something like this has to be at work. When faced with something happening contrary to all logic, _sequere pecuniam_ is not a bad place to go first. (Promised TL;DR version: Intel's effed-up Atoms destroy net6501s once in the field. Intel pays Soekris to replace the bad boards that result, but makes payments conditional on total absolute silence about it by Soekris.) There you have it. Insightful - or absolutely wacko?. You decide. But do save this posting in case it gets removed from the archives... /DR/ > > On 2017, Mar 19, at 6:26 PM, Dries Verachtert <dries.veracht...@dries.eu> wrote: > > Dear Soekris wizards, > > I have a net6501 soekris device and it has a strange issue: when the > > device is working correctly and I reboot, then it doesn't start > > anymore: the red error led stays on and there's no output on the > > serial port. A reset by pressing the reset button or a reset by > > disconnecting and connecting again the power source does not make a > > difference: the device simply does not start anymore. The only thing > > that still works is the uManager Monitor that I can access with '+++': > > I can even upload the latest rom and issue commands like 'power cycle' > > but in the end to no avail: the regular bios/pxeboot/os does not > > start. If I keep the device unplugged from a power source for +/- 8 minutes, > > then it does start again and everything works like it should. > >The device has been running uninterrupted for years without a reboot > > in the 19" case with built-in power supply that is sold by Soekris. > > I've also tried now with some other 12V power supply but it doesn't > > seem to make a difference. > > Do you maybe know what could be the problem? Any suggestions on what I > > still could try or how I might be able to solve the problem? > > Kind regards, Dries
_______________________________________________ Soekris-tech mailing list Soekris-tech@lists.soekris.com http://lists.soekris.com/mailman/listinfo/soekris-tech