Re: requesting help working around boot failures with supermicro atom board
Synopsis: if sensors show missing data then reset the BMC unit before rebooting the system to prevent unable to boot long beep issue. I found a reliably reproducible workaround for this problem retaining control continuity without the need to trip the mains breaker. This entirely prevents the long beep issue and allows the system to be used in headless remote environments without ensuring remote mains power cycle capability and/or remote hands intervention. I have not had to disable the lm(4) sensor as advised previously for the workaround and reached the conclusion this problem is not caused by the driver itself in the first place, but by a buggy BMC firmware. For this it is advisable to contact again the technical support at Supermicro and ask them for a reliable BMC firmware update which does not manifest the problem. After running for a longer period (non specific or deterministic, above 30min), the sensors start to display wrong (missing) values and can not provide data points to the BMC firmware. This is seen both in IPMI direct and networked access and in the web based management interface. At this point, a reboot would get the system unable to boot manifesting the dreaded long beep. Only a power cycle of mains (power supply breaker or power distribution unit) for a couple of seconds unblocks the system and it is capable of successfully booting up again. This however totally undermines the remote control capabilities of the system effectively turning it into a continuous source of remote management manual reboot requests via intervention events for mains power cycle (stop and start). The workaround for this is to reset the BMC before attempting to reboot the system, and it works over the network directly over IPMI and also via the web based BMC interface likewise. This only reboots the IPMI controller (not the system) and its embedded firmware, then after a couple of minutes the sensors poll actual correct data and display it properly. At this point a system reboot issued succeeds as expected and everything the system boots up and works properly, until some non specific longer time passes again (from 1h to days) and the BMC controller gets stuck again (with a certainty it gets stuck) for which the indication is missing sensors data and no reboot capability with the long beep indication. This is NOT OS specific unless the driver polling the sensors causes the sensors sub-system in the embedded controller OS to crash, the only factor affecting it so far is found to be the time running the system without mains power cycle. It is a flaw of the BMC firmware for which the solution for sure is to demand an updated firmware from Supermicro without this fault. It would help if more people voice their concerns over this so an updated BMC firmware is issued from Supermicro technical support and published on their web site. Here is how it looks when the BMC is stuck: $ ipmi-sensor System Temp | no reading| ns CPU Temp | no reading| ns CPU FAN | no reading| ns SYS FAN | no reading| ns CPU Vcore| no reading| ns Vichcore | no reading| ns +3.3VCC | no reading| ns VDIMM| no reading| ns +5 V | no reading| ns +12 V| no reading| ns +3.3VSB | no reading| ns VBAT | no reading| ns Chassis Intru| no reading| ns PS Status| 0x00 | ok $ ipmi-sensor-detail System Temp | na || na| na| na| na | na| na| na CPU Temp | na || na| na| na| na | na| na| na CPU FAN | na || na| na| na| na | na| na| na SYS FAN | na || na| na| na| na | na| na| na CPU Vcore| na || na| na| na| na | na| na| na Vichcore | na || na| na| na| na | na| na| na +3.3VCC | na || na| na| na| na | na| na| na VDIMM| na || na| na| na| na | na| na| na +5 V | na || na| na| na| na | na| na| na +12 V| na || na| na| na| na | na| na| na +3.3VSB | na || na| na| na
Re: requesting help working around boot failures with supermicro atom board
I have a great relationship with some SuperMicro engineers, if others can provide part #'s and firmare/bios revs, I can bring this up with them. From: owner-m...@openbsd.org <owner-m...@openbsd.org> on behalf of li...@wrant.com <li...@wrant.com> Sent: Wednesday, October 21, 2015 8:50 PM To: misc@openbsd.org Subject: Re: requesting help working around boot failures with supermicro atom board Synopsis: if sensors show missing data then reset the BMC unit before rebooting the system to prevent unable to boot long beep issue. I found a reliably reproducible workaround for this problem retaining control continuity without the need to trip the mains breaker. This entirely prevents the long beep issue and allows the system to be used in headless remote environments without ensuring remote mains power cycle capability and/or remote hands intervention. I have not had to disable the lm(4) sensor as advised previously for the workaround and reached the conclusion this problem is not caused by the driver itself in the first place, but by a buggy BMC firmware. For this it is advisable to contact again the technical support at Supermicro and ask them for a reliable BMC firmware update which does not manifest the problem. After running for a longer period (non specific or deterministic, above 30min), the sensors start to display wrong (missing) values and can not provide data points to the BMC firmware. This is seen both in IPMI direct and networked access and in the web based management interface. At this point, a reboot would get the system unable to boot manifesting the dreaded long beep. Only a power cycle of mains (power supply breaker or power distribution unit) for a couple of seconds unblocks the system and it is capable of successfully booting up again. This however totally undermines the remote control capabilities of the system effectively turning it into a continuous source of remote management manual reboot requests via intervention events for mains power cycle (stop and start). The workaround for this is to reset the BMC before attempting to reboot the system, and it works over the network directly over IPMI and also via the web based BMC interface likewise. This only reboots the IPMI controller (not the system) and its embedded firmware, then after a couple of minutes the sensors poll actual correct data and display it properly. At this point a system reboot issued succeeds as expected and everything the system boots up and works properly, until some non specific longer time passes again (from 1h to days) and the BMC controller gets stuck again (with a certainty it gets stuck) for which the indication is missing sensors data and no reboot capability with the long beep indication. This is NOT OS specific unless the driver polling the sensors causes the sensors sub-system in the embedded controller OS to crash, the only factor affecting it so far is found to be the time running the system without mains power cycle. It is a flaw of the BMC firmware for which the solution for sure is to demand an updated firmware from Supermicro without this fault. It would help if more people voice their concerns over this so an updated BMC firmware is issued from Supermicro technical support and published on their web site. Here is how it looks when the BMC is stuck: $ ipmi-sensor System Temp | no reading| ns CPU Temp | no reading| ns CPU FAN | no reading| ns SYS FAN | no reading| ns CPU Vcore | no reading| ns Vichcore | no reading| ns +3.3VCC | no reading| ns VDIMM| no reading| ns +5 V | no reading| ns +12 V| no reading| ns +3.3VSB | no reading| ns VBAT | no reading| ns Chassis Intru| no reading| ns PS Status| 0x00 | ok $ ipmi-sensor-detail System Temp | na || na| na | na| na| na| na| na CPU Temp | na || na| na| na| na| na| na | na CPU FAN | na || na| na| na | na| na| na| na SYS FAN | na | | na| na| na| na| na| na| na CPU Vcore| na || na| na| na| na | na| na| na Vichcore | na || na | na| na| na| na| na| na +3.3VCC | na || na| na| na| na| na | na| na VDIMM| na || na| na | na| na| na| na| na +5 V | na || na| na| na| na| na| na | na +12 V| na || na| na| na | na| na
Re: requesting help working around boot failures with supermicro atom board
On Wed, Oct 07, 2015 at 11:17:25PM -0400, Dewey Hylton wrote: > you missed my update which followed that post. it did not survive the night > - even with lm disabled in the kernel, some number of reboots later i > encountered the same failure. that update is on the list, but i'll include > the copy/paste below. > > meanwhile, is there still hope for answers relating to acpi? > I doubt it. I took a look at your AML and it seemed reasonable. -ml > -- Forwarded message -- > From: Dewey Hylton <dewey.hyl...@gmail.com> > To: misc@openbsd.org > Cc: > Date: Tue, 15 Sep 2015 19:19:10 + (UTC) > Subject: Re: requesting help working around boot failures with supermicro > atom board > Dewey Hylton gmail.com> writes: > > > > > Mark Kettenis xs4all.nl> writes: > > > > Oh that is interesting. Can you try disabling the lm(4) driver in > > > your kernel? You can do: > > > > > > # config -ef /bsd > > > ... > > > ukc> disable lm > > > 254 lm0 disabled > > > 255 lm* disabled > > > 256 lm* disabled > > > ukc> quit > > > Saving modified kernel. > > > # reboot > > > > > > That reboot will probably still hang. But it'd be interesting to see > > > if any subsequent reboots work better. > > > > > sadly, the first thing i heard when entering the lab this morning was > BEP! > > so disabling the sensor drivers in the kernel did not do the trick. without > other ideas, i'm down to providing acpidump output and hoping someone can > tell me where to go next ... > > > On Wed, Oct 7, 2015 at 12:41 AM, Mike Larkin <mlar...@azathoth.net> wrote: > > > On Tue, Sep 15, 2015 at 02:45:02AM +, Dewey Hylton wrote: > > > Mark Kettenis xs4all.nl> writes: > > > > > > > > > > > > # sysctl -a|grep 'sensors.*temp' > > > > > hw.sensors.cpu0.temp0=30.00 degC > > > > > hw.sensors.lm1.temp0=0.00 degC > > > > > hw.sensors.lm1.temp1=14.00 degC > > > > > hw.sensors.lm1.temp2=14.00 degC > > > > > # reboot > > > > > > > > > > BEEEP! > > > > > > > > Oh that is interesting. Can you try disabling the lm(4) driver in > > > > your kernel? You can do: > > > > > > > > # config -ef /bsd > > > > ... > > > > ukc> disable lm > > > > 254 lm0 disabled > > > > 255 lm* disabled > > > > 256 lm* disabled > > > > ukc> quit > > > > Saving modified kernel. > > > > # reboot > > > > > > > > That reboot will probably still hang. But it'd be interesting to see > > > > if any subsequent reboots work better. > > > > > > *this* interests me, and was basically what i was asking in the original > > > post - except i had no idea what might need to be disabled. one step at a > > > time, it's been interesting the things that have popped up. > > > > > > still no idea whether this has anything to do with the seemingly > > > openbsd-only issue, but ... > > > > > > i made this change, booted the new kernel, ran 'cksum /dev/mem' a bunch > > of > > > times in hopes of raising the temperature somewhat (did get to 36C, > > which is > > > higher than in my previous tests). then i rebooted, and the box came > > back up > > > without incident. > > > > > > so i'm going to run through this several times with reboots in every 20 > > > minutes or so and see if it survives the night. > > > > > > > Based on this and my previous email, my recommendation would be to disable > > lm(4) on this particular machine.
Re: requesting help working around boot failures with supermicro atom board
ah, well thanks for taking a look. On Thu, Oct 8, 2015 at 3:09 PM, Mike Larkin <mlar...@azathoth.net> wrote: > On Wed, Oct 07, 2015 at 11:17:25PM -0400, Dewey Hylton wrote: > > you missed my update which followed that post. it did not survive the > night > > - even with lm disabled in the kernel, some number of reboots later i > > encountered the same failure. that update is on the list, but i'll > include > > the copy/paste below. > > > > meanwhile, is there still hope for answers relating to acpi? > > > > I doubt it. I took a look at your AML and it seemed reasonable. > > -ml > > > -- Forwarded message -- > > From: Dewey Hylton <dewey.hyl...@gmail.com> > > To: misc@openbsd.org > > Cc: > > Date: Tue, 15 Sep 2015 19:19:10 + (UTC) > > Subject: Re: requesting help working around boot failures with supermicro > > atom board > > Dewey Hylton gmail.com> writes: > > > > > > > > Mark Kettenis xs4all.nl> writes: > > > > > > Oh that is interesting. Can you try disabling the lm(4) driver in > > > > your kernel? You can do: > > > > > > > > # config -ef /bsd > > > > ... > > > > ukc> disable lm > > > > 254 lm0 disabled > > > > 255 lm* disabled > > > > 256 lm* disabled > > > > ukc> quit > > > > Saving modified kernel. > > > > # reboot > > > > > > > > That reboot will probably still hang. But it'd be interesting to see > > > > if any subsequent reboots work better. > > > > > > > > > sadly, the first thing i heard when entering the lab this morning was > > BEP! > > > > so disabling the sensor drivers in the kernel did not do the trick. > without > > other ideas, i'm down to providing acpidump output and hoping someone can > > tell me where to go next ... > > > > > > On Wed, Oct 7, 2015 at 12:41 AM, Mike Larkin <mlar...@azathoth.net> > wrote: > > > > > On Tue, Sep 15, 2015 at 02:45:02AM +, Dewey Hylton wrote: > > > > Mark Kettenis xs4all.nl> writes: > > > > > > > > > > > > > > > # sysctl -a|grep 'sensors.*temp' > > > > > > hw.sensors.cpu0.temp0=30.00 degC > > > > > > hw.sensors.lm1.temp0=0.00 degC > > > > > > hw.sensors.lm1.temp1=14.00 degC > > > > > > hw.sensors.lm1.temp2=14.00 degC > > > > > > # reboot > > > > > > > > > > > > BEEEP! > > > > > > > > > > Oh that is interesting. Can you try disabling the lm(4) driver in > > > > > your kernel? You can do: > > > > > > > > > > # config -ef /bsd > > > > > ... > > > > > ukc> disable lm > > > > > 254 lm0 disabled > > > > > 255 lm* disabled > > > > > 256 lm* disabled > > > > > ukc> quit > > > > > Saving modified kernel. > > > > > # reboot > > > > > > > > > > That reboot will probably still hang. But it'd be interesting to > see > > > > > if any subsequent reboots work better. > > > > > > > > *this* interests me, and was basically what i was asking in the > original > > > > post - except i had no idea what might need to be disabled. one step > at a > > > > time, it's been interesting the things that have popped up. > > > > > > > > still no idea whether this has anything to do with the seemingly > > > > openbsd-only issue, but ... > > > > > > > > i made this change, booted the new kernel, ran 'cksum /dev/mem' a > bunch > > > of > > > > times in hopes of raising the temperature somewhat (did get to 36C, > > > which is > > > > higher than in my previous tests). then i rebooted, and the box came > > > back up > > > > without incident. > > > > > > > > so i'm going to run through this several times with reboots in every > 20 > > > > minutes or so and see if it survives the night. > > > > > > > > > > Based on this and my previous email, my recommendation would be to > disable > > > lm(4) on this particular machine.
Re: requesting help working around boot failures with supermicro atom board
you missed my update which followed that post. it did not survive the night - even with lm disabled in the kernel, some number of reboots later i encountered the same failure. that update is on the list, but i'll include the copy/paste below. meanwhile, is there still hope for answers relating to acpi? -- Forwarded message -- From: Dewey Hylton <dewey.hyl...@gmail.com> To: misc@openbsd.org Cc: Date: Tue, 15 Sep 2015 19:19:10 + (UTC) Subject: Re: requesting help working around boot failures with supermicro atom board Dewey Hylton gmail.com> writes: > > Mark Kettenis xs4all.nl> writes: > > Oh that is interesting. Can you try disabling the lm(4) driver in > > your kernel? You can do: > > > > # config -ef /bsd > > ... > > ukc> disable lm > > 254 lm0 disabled > > 255 lm* disabled > > 256 lm* disabled > > ukc> quit > > Saving modified kernel. > > # reboot > > > > That reboot will probably still hang. But it'd be interesting to see > > if any subsequent reboots work better. > sadly, the first thing i heard when entering the lab this morning was BEP! so disabling the sensor drivers in the kernel did not do the trick. without other ideas, i'm down to providing acpidump output and hoping someone can tell me where to go next ... On Wed, Oct 7, 2015 at 12:41 AM, Mike Larkin <mlar...@azathoth.net> wrote: > On Tue, Sep 15, 2015 at 02:45:02AM +, Dewey Hylton wrote: > > Mark Kettenis xs4all.nl> writes: > > > > > > > > > # sysctl -a|grep 'sensors.*temp' > > > > hw.sensors.cpu0.temp0=30.00 degC > > > > hw.sensors.lm1.temp0=0.00 degC > > > > hw.sensors.lm1.temp1=14.00 degC > > > > hw.sensors.lm1.temp2=14.00 degC > > > > # reboot > > > > > > > > BEEEP! > > > > > > Oh that is interesting. Can you try disabling the lm(4) driver in > > > your kernel? You can do: > > > > > > # config -ef /bsd > > > ... > > > ukc> disable lm > > > 254 lm0 disabled > > > 255 lm* disabled > > > 256 lm* disabled > > > ukc> quit > > > Saving modified kernel. > > > # reboot > > > > > > That reboot will probably still hang. But it'd be interesting to see > > > if any subsequent reboots work better. > > > > *this* interests me, and was basically what i was asking in the original > > post - except i had no idea what might need to be disabled. one step at a > > time, it's been interesting the things that have popped up. > > > > still no idea whether this has anything to do with the seemingly > > openbsd-only issue, but ... > > > > i made this change, booted the new kernel, ran 'cksum /dev/mem' a bunch > of > > times in hopes of raising the temperature somewhat (did get to 36C, > which is > > higher than in my previous tests). then i rebooted, and the box came > back up > > without incident. > > > > so i'm going to run through this several times with reboots in every 20 > > minutes or so and see if it survives the night. > > > > Based on this and my previous email, my recommendation would be to disable > lm(4) on this particular machine.
Re: requesting help working around boot failures with supermicro atom board
Tue, 6 Oct 2015 21:41:15 -0700 Mike Larkin> I had thought this was acpi related earlier (before we realized that disabling > lm* fixes it). So I have no news here, as I don't think the solution is going > to be found in the AML. Thanks for the update and pointer in the right direction (regarding disabling lm(4) sensor). Indeed this does not happen with bsd.rd during upgrades, and I recall back in the day this issue may not have been originally present back in 2011. > The lm(4) sensor is probably getting wedged somehow, which is causing the bios > to think the machine is too hot on reboot. Even though it's not. Makes sense as the readings are improbable after running for a while and things looks stuck somehow, including in the BMC web interface. Side note: I know, just don't sway to the insane design flaws regarding security and interfaces, there are popcorn scary topics on the list, now stay on topic pls. Here is the reading from a long run where the sensors appear stuck both on the shell and in the BMC: $ sysctl hw.sensors hw.sensors.cpu0.temp0=33.00 degC hw.sensors.lm1.temp0=-1.00 degC hw.sensors.lm1.temp1=-0.50 degC hw.sensors.lm1.temp2=-0.50 degC hw.sensors.lm1.volt0=2.04 VDC (VCore) hw.sensors.lm1.volt1=13.46 VDC (+12V) hw.sensors.lm1.volt2=4.08 VDC (+3.3V) hw.sensors.lm1.volt3=4.08 VDC (+3.3V) hw.sensors.lm1.volt4=1.85 VDC (-12V) hw.sensors.lm1.volt5=0.00 VDC hw.sensors.lm1.volt6=0.00 VDC hw.sensors.lm1.volt7=4.08 VDC (3.3VSB) hw.sensors.lm1.volt8=2.04 VDC (VBAT) $ And again after resetting the IPMI device these look only incorrect at some readings, but not as stuck as above: $ sysctl hw.sensors hw.sensors.cpu0.temp0=33.00 degC hw.sensors.lm1.temp0=41.00 degC hw.sensors.lm1.temp1=42.00 degC hw.sensors.lm1.temp2=26.00 degC hw.sensors.lm1.volt0=1.10 VDC (VCore) hw.sensors.lm1.volt1=6.86 VDC (+12V) hw.sensors.lm1.volt2=3.33 VDC (+3.3V) hw.sensors.lm1.volt3=3.33 VDC (+3.3V) hw.sensors.lm1.volt4=-10.34 VDC (-12V) hw.sensors.lm1.volt5=1.28 VDC hw.sensors.lm1.volt6=1.82 VDC hw.sensors.lm1.volt7=3.28 VDC (3.3VSB) hw.sensors.lm1.volt8=1.57 VDC (VBAT) > I don't know a lot about the lm(4) driver so I don't think I'll be able to > help much here. One of the things I do know about it is that sometimes you > don't actually even have a real lm(4), and that it's simulated by some other > component or even SMM. Maybe the manufacturer did a poor job. Shrug. Please compare the above with the values presented in the BMC web interface: NameStatus Reading System Temp Normal 41 degrees C CPU TempNormal 42 degrees C CPU FAN N/A Not Present! SYS FAN N/A Not Present! CPU Vcore Normal 1.096 Volts VichcoreNormal 1.04 Volts +3.3VCC Normal 3.328 Volts VDIMM Normal 1.528 Volts +5 VNormal 5.12 Volts +12 V Normal 12.084 Volts +3.3VSB Normal 3.28 Volts VBATNormal 3.136 Volts Chassis Intru OK PS Status Presence detected. Here is from the ipmitool over the network: $ ipmitool -I lanplus -U musr1 -f .ipmipass -H 10.10.10.10 sdr System Temp | 41 degrees C | ok CPU Temp | 42 degrees C | ok CPU FAN | no reading| ns SYS FAN | no reading| ns CPU Vcore| 1.10 Volts| ok Vichcore | 1.04 Volts| ok +3.3VCC | 3.33 Volts| ok VDIMM| 1.54 Volts| ok +5 V | 5.12 Volts| ok +12 V| 12.08 Volts | ok +3.3VSB | 3.28 Volts| ok VBAT | 3.14 Volts| ok Chassis Intru| 0x00 | ok PS Status| 0x00 | ok $ Same thing with mode details and thresholds (untouched from defaults, for reference only where the lm(4) sensor may be getting some of the funny values): $ ipmitool -I lanplus -U musr1 -f .ipmipass -H 10.10.10.10 sensor System Temp | 42.000 | degrees C | ok| -9.000| -7.000| -5.000| 75.000| 77.000| 79.000 CPU Temp | 42.000 | degrees C | ok| -11.000 | -8.000| -5.000| 85.000| 90.000| 95.000 CPU FAN | na || na| na| na| na | na| na| na SYS FAN | na || na| na| na| na | na| na| na CPU Vcore| 1.096 | Volts | ok| 0.640 | 0.664 | 0.688 | 1.344 | 1.408 | 1.472 Vichcore | 1.040 | Volts | ok| 0.808 | 0.824 | 0.840 | 1.160 | 1.176 | 1.192 +3.3VCC | 3.328 | Volts | ok| 2.816 | 2.880 | 2.944 | 3.584 | 3.648 | 3.712 VDIMM| 1.528 | Volts
Re: requesting help working around boot failures with supermicro atom board
On Mon, Oct 05, 2015 at 01:18:53PM -0400, dewey.hyl...@gmail.com wrote: > unfortunately, not on my end. i have hopes that mike larkin may find something > when he gets a chance to look, but i am past the limit of my capabilities and > supermicro support has discontinued responding to me. their last suggestion > was > to switch to linux or windows, and their last message was of the "we'll get > back to you" variety. > I had thought this was acpi related earlier (before we realized that disabling lm* fixes it). So I have no news here, as I don't think the solution is going to be found in the AML. The lm(4) sensor is probably getting wedged somehow, which is causing the bios to think the machine is too hot on reboot. Even though it's not. I don't know a lot about the lm(4) driver so I don't think I'll be able to help much here. One of the things I do know about it is that sometimes you don't actually even have a real lm(4), and that it's simulated by some other component or even SMM. Maybe the manufacturer did a poor job. Shrug. Sorry, I'm out of ideas. Maybe someone else can debug it for you. -ml > so on a related note, i'm on the hunt for something which can replace this > board's functionality without breaking the bank. something not supported by > supermicro, as this is a brand new board and they seem to be unwilling to > provide support anyway. remote kvm/power is the sole purpose for choosing this > supermicro device in the first place. i have plenty much more expensive and > more powerful supermicro devices at customer sites which do not show this > issue - but their non-support of this brand-new motherboard shows me that they > are not who i want to be relying on. > > - On Oct 5, 2015, at 12:08 PM, Sonic sonicsm...@gmail.com wrote: > > Any progress on this issue? > > On Thu, Sep 17, 2015 at 1:26 PM, Mike Larkinwrote: > > On Thu, Sep 17, 2015 at 12:40:12PM +, Dewey Hylton wrote: > >> Dewey Hylton gmail.com> writes: > >> > >> > > >> > Mike Larkin azathoth.net> writes: > >> > > >> > > > >> > > On Tue, Sep 15, 2015 at 07:16:40PM +, Dewey Hylton wrote: > >> > > > Dewey Hylton gmail.com> writes: > >> > > > > >> > > > > > >> > > > > Mike Larkin azathoth.net> writes: > >> > > > > >> > > > > > acpidump please. > >> > >> > motherboard: supermicro x7spe-hf-d525 rev 1.0 > >> > bios: 1.2b > >> > > >> > at the end of this link is an archive containing acpidump output for all > >> > three acpi settings in the bios (1.0, 2.0, 3.0). > >> > > >> > https://goo.gl/tWGL6C > >> > > >> > i apologize for the somewhat hidden link; gmane wouldn't allow me to post > >> > the full link because it's greater than 80 characters. > >> > > >> > please let me know if i can help in any way; i honestly know nothing > >> > about > >> > acpi but am willing to learn or assist otherwise if it means > >> > understanding > >> > and potentially fixing this issue. > >> > >> i was able to export the DSDT files into something human-readable. while i > >> don't really understand much of what i'm seeing in the resulting text > >> files, > >> diff shows that the differences between the three acpi versions are > >> nonexistent. i have no idea about the other files, of which there are > >> several. > >> > >> Mike, does the acpidump output help at all? if not, am i simply at the > >> point > >> where this hardware is not compatible with OpenBSD? > >> > > > > Haven't had a chance to look at it yet. > > > > -ml
Re: requesting help working around boot failures with supermicro atom board
On Tue, Sep 15, 2015 at 02:45:02AM +, Dewey Hylton wrote: > Mark Kettenis xs4all.nl> writes: > > > > > > # sysctl -a|grep 'sensors.*temp' > > > hw.sensors.cpu0.temp0=30.00 degC > > > hw.sensors.lm1.temp0=0.00 degC > > > hw.sensors.lm1.temp1=14.00 degC > > > hw.sensors.lm1.temp2=14.00 degC > > > # reboot > > > > > > BEEEP! > > > > Oh that is interesting. Can you try disabling the lm(4) driver in > > your kernel? You can do: > > > > # config -ef /bsd > > ... > > ukc> disable lm > > 254 lm0 disabled > > 255 lm* disabled > > 256 lm* disabled > > ukc> quit > > Saving modified kernel. > > # reboot > > > > That reboot will probably still hang. But it'd be interesting to see > > if any subsequent reboots work better. > > *this* interests me, and was basically what i was asking in the original > post - except i had no idea what might need to be disabled. one step at a > time, it's been interesting the things that have popped up. > > still no idea whether this has anything to do with the seemingly > openbsd-only issue, but ... > > i made this change, booted the new kernel, ran 'cksum /dev/mem' a bunch of > times in hopes of raising the temperature somewhat (did get to 36C, which is > higher than in my previous tests). then i rebooted, and the box came back up > without incident. > > so i'm going to run through this several times with reboots in every 20 > minutes or so and see if it survives the night. > Based on this and my previous email, my recommendation would be to disable lm(4) on this particular machine.
Re: requesting help working around boot failures with supermicro atom board
On Mon, Oct 5, 2015 at 1:18 PM, dewey.hyl...@gmail.comwrote: > but their non-support of this brand-new motherboard When the other OS's work fine is does seem to point to an OpenBSD issue, but that's not always a reliable conclusion to arrive at. Either way it would be nice to see it resolved. There was a time when the problem did not exist, but it's been so long that I have no clue any longer when the change occurred that triggered the issue. Chris
Re: requesting help working around boot failures with supermicro atom board
unfortunately, not on my end. i have hopes that mike larkin may find something when he gets a chance to look, but i am past the limit of my capabilities and supermicro support has discontinued responding to me. their last suggestion was to switch to linux or windows, and their last message was of the "we'll get back to you" variety. so on a related note, i'm on the hunt for something which can replace this board's functionality without breaking the bank. something not supported by supermicro, as this is a brand new board and they seem to be unwilling to provide support anyway. remote kvm/power is the sole purpose for choosing this supermicro device in the first place. i have plenty much more expensive and more powerful supermicro devices at customer sites which do not show this issue - but their non-support of this brand-new motherboard shows me that they are not who i want to be relying on. - On Oct 5, 2015, at 12:08 PM, Sonic sonicsm...@gmail.com wrote: Any progress on this issue? On Thu, Sep 17, 2015 at 1:26 PM, Mike Larkinwrote: > On Thu, Sep 17, 2015 at 12:40:12PM +, Dewey Hylton wrote: >> Dewey Hylton gmail.com> writes: >> >> > >> > Mike Larkin azathoth.net> writes: >> > >> > > >> > > On Tue, Sep 15, 2015 at 07:16:40PM +, Dewey Hylton wrote: >> > > > Dewey Hylton gmail.com> writes: >> > > > >> > > > > >> > > > > Mike Larkin azathoth.net> writes: >> > > > >> > > > > > acpidump please. >> >> > motherboard: supermicro x7spe-hf-d525 rev 1.0 >> > bios: 1.2b >> > >> > at the end of this link is an archive containing acpidump output for all >> > three acpi settings in the bios (1.0, 2.0, 3.0). >> > >> > https://goo.gl/tWGL6C >> > >> > i apologize for the somewhat hidden link; gmane wouldn't allow me to post >> > the full link because it's greater than 80 characters. >> > >> > please let me know if i can help in any way; i honestly know nothing about >> > acpi but am willing to learn or assist otherwise if it means understanding >> > and potentially fixing this issue. >> >> i was able to export the DSDT files into something human-readable. while i >> don't really understand much of what i'm seeing in the resulting text files, >> diff shows that the differences between the three acpi versions are >> nonexistent. i have no idea about the other files, of which there are >> several. >> >> Mike, does the acpidump output help at all? if not, am i simply at the point >> where this hardware is not compatible with OpenBSD? >> > > Haven't had a chance to look at it yet. > > -ml
Re: requesting help working around boot failures with supermicro atom board
Any progress on this issue? On Thu, Sep 17, 2015 at 1:26 PM, Mike Larkinwrote: > On Thu, Sep 17, 2015 at 12:40:12PM +, Dewey Hylton wrote: >> Dewey Hylton gmail.com> writes: >> >> > >> > Mike Larkin azathoth.net> writes: >> > >> > > >> > > On Tue, Sep 15, 2015 at 07:16:40PM +, Dewey Hylton wrote: >> > > > Dewey Hylton gmail.com> writes: >> > > > >> > > > > >> > > > > Mike Larkin azathoth.net> writes: >> > > > >> > > > > > acpidump please. >> >> > motherboard: supermicro x7spe-hf-d525 rev 1.0 >> > bios: 1.2b >> > >> > at the end of this link is an archive containing acpidump output for all >> > three acpi settings in the bios (1.0, 2.0, 3.0). >> > >> > https://goo.gl/tWGL6C >> > >> > i apologize for the somewhat hidden link; gmane wouldn't allow me to post >> > the full link because it's greater than 80 characters. >> > >> > please let me know if i can help in any way; i honestly know nothing about >> > acpi but am willing to learn or assist otherwise if it means understanding >> > and potentially fixing this issue. >> >> i was able to export the DSDT files into something human-readable. while i >> don't really understand much of what i'm seeing in the resulting text files, >> diff shows that the differences between the three acpi versions are >> nonexistent. i have no idea about the other files, of which there are >> several. >> >> Mike, does the acpidump output help at all? if not, am i simply at the point >> where this hardware is not compatible with OpenBSD? >> > > Haven't had a chance to look at it yet. > > -ml
Re: requesting help working around boot failures with supermicro atom board
Dewey Hylton gmail.com> writes: > > Mike Larkin azathoth.net> writes: > > > > > On Tue, Sep 15, 2015 at 07:16:40PM +, Dewey Hylton wrote: > > > Dewey Hylton gmail.com> writes: > > > > > > > > > > > Mike Larkin azathoth.net> writes: > > > > > > > > acpidump please. > motherboard: supermicro x7spe-hf-d525 rev 1.0 > bios: 1.2b > > at the end of this link is an archive containing acpidump output for all > three acpi settings in the bios (1.0, 2.0, 3.0). > > https://goo.gl/tWGL6C > > i apologize for the somewhat hidden link; gmane wouldn't allow me to post > the full link because it's greater than 80 characters. > > please let me know if i can help in any way; i honestly know nothing about > acpi but am willing to learn or assist otherwise if it means understanding > and potentially fixing this issue. i was able to export the DSDT files into something human-readable. while i don't really understand much of what i'm seeing in the resulting text files, diff shows that the differences between the three acpi versions are nonexistent. i have no idea about the other files, of which there are several. Mike, does the acpidump output help at all? if not, am i simply at the point where this hardware is not compatible with OpenBSD?
Re: requesting help working around boot failures with supermicro atom board
On Thu, Sep 17, 2015 at 12:40:12PM +, Dewey Hylton wrote: > Dewey Hylton gmail.com> writes: > > > > > Mike Larkin azathoth.net> writes: > > > > > > > > On Tue, Sep 15, 2015 at 07:16:40PM +, Dewey Hylton wrote: > > > > Dewey Hylton gmail.com> writes: > > > > > > > > > > > > > > Mike Larkin azathoth.net> writes: > > > > > > > > > > acpidump please. > > > motherboard: supermicro x7spe-hf-d525 rev 1.0 > > bios: 1.2b > > > > at the end of this link is an archive containing acpidump output for all > > three acpi settings in the bios (1.0, 2.0, 3.0). > > > > https://goo.gl/tWGL6C > > > > i apologize for the somewhat hidden link; gmane wouldn't allow me to post > > the full link because it's greater than 80 characters. > > > > please let me know if i can help in any way; i honestly know nothing about > > acpi but am willing to learn or assist otherwise if it means understanding > > and potentially fixing this issue. > > i was able to export the DSDT files into something human-readable. while i > don't really understand much of what i'm seeing in the resulting text files, > diff shows that the differences between the three acpi versions are > nonexistent. i have no idea about the other files, of which there are several. > > Mike, does the acpidump output help at all? if not, am i simply at the point > where this hardware is not compatible with OpenBSD? > Haven't had a chance to look at it yet. -ml
Re: requesting help working around boot failures with supermicro atom board
Mike Larkin azathoth.net> writes: > > On Tue, Sep 15, 2015 at 07:16:40PM +, Dewey Hylton wrote: > > Dewey Hylton gmail.com> writes: > > > > > > > > Mike Larkin azathoth.net> writes: > > > > > > acpidump please. > > > > > > my pleasure: > > > > > > [demime removed a uuencoded section named > > supermicro-X7SPE-HF-D525-acpidump.tgz which was 276 lines] > > > > > > > > > > alright ... so this didn't work. i'll try to make the acpidump available via > > another site somewhere. on that note, the bios allows selection between acpi > > 1/2/3 - would it help at all to have acpidump for each of those three settings? > > > > Sure. > > motherboard: supermicro x7spe-hf-d525 rev 1.0 bios: 1.2b at the end of this link is an archive containing acpidump output for all three acpi settings in the bios (1.0, 2.0, 3.0). https://goo.gl/tWGL6C i apologize for the somewhat hidden link; gmane wouldn't allow me to post the full link because it's greater than 80 characters. please let me know if i can help in any way; i honestly know nothing about acpi but am willing to learn or assist otherwise if it means understanding and potentially fixing this issue.
Re: requesting help working around boot failures with supermicro atom board
On Tue, Sep 15, 2015 at 07:16:40PM +, Dewey Hylton wrote: > Dewey Hylton gmail.com> writes: > > > > > Mike Larkin azathoth.net> writes: > > > > acpidump please. > > > > my pleasure: > > > > [demime removed a uuencoded section named > supermicro-X7SPE-HF-D525-acpidump.tgz which was 276 lines] > > > > > > alright ... so this didn't work. i'll try to make the acpidump available via > another site somewhere. on that note, the bios allows selection between acpi > 1/2/3 - would it help at all to have acpidump for each of those three > settings? > Sure.
Re: requesting help working around boot failures with supermicro atom board
Dewey Hylton gmail.com> writes: > > Mark Kettenis xs4all.nl> writes: > > Oh that is interesting. Can you try disabling the lm(4) driver in > > your kernel? You can do: > > > > # config -ef /bsd > > ... > > ukc> disable lm > > 254 lm0 disabled > > 255 lm* disabled > > 256 lm* disabled > > ukc> quit > > Saving modified kernel. > > # reboot > > > > That reboot will probably still hang. But it'd be interesting to see > > if any subsequent reboots work better. > sadly, the first thing i heard when entering the lab this morning was BEP! so disabling the sensor drivers in the kernel did not do the trick. without other ideas, i'm down to providing acpidump output and hoping someone can tell me where to go next ...
Re: requesting help working around boot failures with supermicro atom board
Dewey Hylton gmail.com> writes: > > Mike Larkin azathoth.net> writes: > > acpidump please. > > my pleasure: > > [demime removed a uuencoded section named supermicro-X7SPE-HF-D525-acpidump.tgz which was 276 lines] > > alright ... so this didn't work. i'll try to make the acpidump available via another site somewhere. on that note, the bios allows selection between acpi 1/2/3 - would it help at all to have acpidump for each of those three settings?
Re: requesting help working around boot failures with supermicro atom board
On Sat, Sep 12, 2015 at 03:51:36PM +, Dewey Hylton wrote: > the only real differences i see are: > 1) bios revision > 2) secondary disk attached to different sata port > 3) sensors only present on working machine I've had this issue with the same systems. Never guessed it would be OpenBSD specific. What I've found to make it stop happening is pulling the board out and redoing the thermal paste for the CPU heatsink. I had found some reference indicating that the alarm I got might be because of overheating. The difference between the boxes may be the attention to detail the factory worker who put it together had that day. Hearing that Linux doesn't trip it, I'm wondering if it's an ACPI difference between OpenBSD and Linux. Perhaps OpenBSD runs the CPU hotter before turning it back over to the BIOS on reboot? --Kurt
Re: requesting help working around boot failures with supermicro atom board
Patrick Dohman comcast.net> writes: > > Any thermal settings in the bios? CPU performance, Fan Speed etc.. > > Does the fan idle correctly? Often intel chipsets will throttle the fan during a bios test. > > Perhaps ACPI is not routing an interrupt?? Not much is available to be tweaked in this particular setup, though i do have the options of acpi 1, 2, 3. changing those doesn't appear to result in any difference. regarding ACPI not routing an interrupt ... can you be more specific? is there some way i could test this?
Re: requesting help working around boot failures with supermicro atom board
Kurt Mosiejczuk se.rit.edu> writes: > > On Sat, Sep 12, 2015 at 03:51:36PM +, Dewey Hylton wrote: > > > the only real differences i see are: > > 1) bios revision > > 2) secondary disk attached to different sata port > > 3) sensors only present on working machine > > I've had this issue with the same systems. Never guessed it would be OpenBSD > specific. What I've found to make it stop happening is pulling the board > out and redoing the thermal paste for the CPU heatsink. I had found > some reference indicating that the alarm I got might be because of overheating. > > The difference between the boxes may be the attention to detail the factory > worker who put it together had that day. > > Hearing that Linux doesn't trip it, I'm wondering if it's an ACPI difference > between OpenBSD and Linux. Perhaps OpenBSD runs the CPU hotter before > turning it back over to the BIOS on reboot? > > --Kurt this is great information; thanks. any idea where the temperature reference can be found? i may be able to log the cpu temperature in both operating systems in order to compare ...
Re: requesting help working around boot failures with supermicro atom board
Kurt Mosiejczuk se.rit.edu> writes: > > On Mon, Sep 14, 2015 at 05:15:01PM +, Dewey Hylton wrote: > > > > I've had this issue with the same systems. Never guessed it would > > > be OpenBSD specific. What I've found to make it stop happening is > > > pulling the board out and redoing the thermal paste for the CPU > > > heatsink. I had found some reference indicating that the alarm I > > > got might be because of overheating. > > > > The difference between the boxes may be the attention to detail the > > > factory worker who put it together had that day. > > > > Hearing that Linux doesn't trip it, I'm wondering if it's an ACPI > > > difference between OpenBSD and Linux. Perhaps OpenBSD runs the CPU > > > hotter before turning it back over to the BIOS on reboot? > > > this is great information; thanks. any idea where the temperature > > reference can be found? > > I don't remember, it was at least a couple years ago. It was only one > reference too. Most talked about listening to beep codes, which this > wasn't really beep codes... > > > i may be able to log the cpu temperature in both operating systems in > > order to compare ... > > Possibly, but noticed I said "before turning it back over to the BIOS". > If it's a difference in OS shutdown, it will be difficult to log the > temperature. > > --Kurt understood, but i did uncover something that might provide a hint ... i haven't duplicated the results more than half a dozen times, but so far it's been consistent: after first booting openbsd, i see the following output: # sysctl -a|grep 'sensors.*temp' hw.sensors.cpu0.temp0=31.00 degC hw.sensors.lm1.temp0=48.00 degC hw.sensors.lm1.temp1=52.50 degC hw.sensors.lm1.temp2=36.00 degC # sysctl -a|grep 'sensors.*temp' hw.sensors.cpu0.temp0=30.00 degC hw.sensors.lm1.temp0=48.00 degC hw.sensors.lm1.temp1=52.50 degC hw.sensors.lm1.temp2=36.00 degC # sysctl -a|grep 'sensors.*temp' hw.sensors.cpu0.temp0=31.00 degC hw.sensors.lm1.temp0=48.00 degC hw.sensors.lm1.temp1=52.50 degC hw.sensors.lm1.temp2=36.00 degC # reboot and meet with success ... if i wait just a few minutes (2) i end up with this: # sysctl -a|grep 'sensors.*temp' hw.sensors.cpu0.temp0=30.00 degC hw.sensors.lm1.temp0=48.00 degC hw.sensors.lm1.temp1=52.00 degC hw.sensors.lm1.temp2=35.50 degC # sysctl -a|grep 'sensors.*temp' hw.sensors.cpu0.temp0=30.00 degC hw.sensors.lm1.temp0=0.00 degC hw.sensors.lm1.temp1=14.00 degC hw.sensors.lm1.temp2=14.00 degC # sysctl -a|grep 'sensors.*temp' hw.sensors.cpu0.temp0=30.00 degC hw.sensors.lm1.temp0=0.00 degC hw.sensors.lm1.temp1=14.00 degC hw.sensors.lm1.temp2=14.00 degC # sysctl -a|grep 'sensors.*temp' hw.sensors.cpu0.temp0=30.00 degC hw.sensors.lm1.temp0=0.00 degC hw.sensors.lm1.temp1=14.00 degC hw.sensors.lm1.temp2=14.00 degC # sysctl -a|grep 'sensors.*temp' hw.sensors.cpu0.temp0=30.00 degC hw.sensors.lm1.temp0=0.00 degC hw.sensors.lm1.temp1=14.00 degC hw.sensors.lm1.temp2=14.00 degC # sysctl -a|grep 'sensors.*temp' hw.sensors.cpu0.temp0=30.00 degC hw.sensors.lm1.temp0=0.00 degC hw.sensors.lm1.temp1=14.00 degC hw.sensors.lm1.temp2=14.00 degC # reboot BEEEP! again, not a very scientific/exacting approach, but half a dozen times i've seen the same results. i don't know what it is that trips up the sensors, but that's when i seem to have the issue. now, this is running the 5.4 installation (i downgraded at someone's suggestion for testing) and i can easily reinstall from current snapshot to see if this may be an unrelated bug. but until then, does this scenario make sense to anyone?
Re: requesting help working around boot failures with supermicro atom board
> # sysctl -a|grep 'sensors.*temp' > hw.sensors.cpu0.temp0=30.00 degC > hw.sensors.lm1.temp0=0.00 degC > hw.sensors.lm1.temp1=14.00 degC > hw.sensors.lm1.temp2=14.00 degC > # reboot > > BEEEP! Oh that is interesting. Can you try disabling the lm(4) driver in your kernel? You can do: # config -ef /bsd ... ukc> disable lm 254 lm0 disabled 255 lm* disabled 256 lm* disabled ukc> quit Saving modified kernel. # reboot That reboot will probably still hang. But it'd be interesting to see if any subsequent reboots work better.
Re: requesting help working around boot failures with supermicro atom board
Mike Larkin azathoth.net> writes: > > On Fri, Sep 11, 2015 at 06:38:23PM -0400, dewey.hylton gmail.com wrote: > > hi all. i???m having difficulty with this board: > > > > Supermicro X7SPE-HD-D525 rev1 > > > > i have several similar systems, each running an older version of OpenBSD for a few years without incident. > except this one ??? > > > > running OpenBSD 5.7 i386, from cold start it boots just fine and runs until rebooted. once rebooted, > however, prior to anything being displayed (i assume this is early in the bios post phase) i get one very > long beep. super micro tells me this indicates inability to correctly initialize the memory. okay, so > i???ve changed memory for known working components and have the same issue. at this point, the only thing > that gets me booting again is to remove power and then restore power. it then boots fine from cold start, and > fails on the next reboot (as in, ???reboot??? from the command line). once in long-beep failure mode, > neither the hardware reset button nor the power button can make the machine boot again. the only thing that > works is removing power. every once in a while it will reboot s > uccessfully, only to fail in the same manner on the next attempt. > > > > super micro has had me flash bios, clear cmos, boot from different devices and with nothing connected, > etc. the results are the same: when rebooting from openbsd, next boot fails until power is > removed/restored. super micro blames openbsd. > > > > i installed linux (same hardware, overwrite openbsd 5.7) and scheduled a reboot every 5 minutes and left > it overnight. i logged 554 successful reboots. > > > > i have since installed the latest available openbsd amd64 snapshot, and am seeing the same failures. > > > > i???m wondering if something could be disabled (boot -c ?) or if something else raises a red flag and might > have a workaround. this has me stumped. i would very much appreciate a clue stick. > > > > dmesg follows: > > > > acpidump please. my pleasure: [demime removed a uuencoded section named supermicro-X7SPE-HF-D525-acpidump.tgz which was 276 lines]
Re: requesting help working around boot failures with supermicro atom board
Dewey Hylton gmail.com> writes: > # sysctl -a|grep 'sensors.*temp' > hw.sensors.cpu0.temp0=31.00 degC > hw.sensors.lm1.temp0=48.00 degC > hw.sensors.lm1.temp1=52.50 degC > hw.sensors.lm1.temp2=36.00 degC > # reboot > > and meet with success ... if i wait just a few minutes (2) i end up with this: > > # sysctl -a|grep 'sensors.*temp' > hw.sensors.cpu0.temp0=30.00 degC > hw.sensors.lm1.temp0=48.00 degC > hw.sensors.lm1.temp1=52.00 degC > hw.sensors.lm1.temp2=35.50 degC > # sysctl -a|grep 'sensors.*temp' > hw.sensors.cpu0.temp0=30.00 degC > hw.sensors.lm1.temp0=0.00 degC > hw.sensors.lm1.temp1=14.00 degC > hw.sensors.lm1.temp2=14.00 degC > # sysctl -a|grep 'sensors.*temp' > hw.sensors.cpu0.temp0=30.00 degC > hw.sensors.lm1.temp0=0.00 degC > hw.sensors.lm1.temp1=14.00 degC > hw.sensors.lm1.temp2=14.00 degC > # sysctl -a|grep 'sensors.*temp' > hw.sensors.cpu0.temp0=30.00 degC > hw.sensors.lm1.temp0=0.00 degC > hw.sensors.lm1.temp1=14.00 degC > hw.sensors.lm1.temp2=14.00 degC > # sysctl -a|grep 'sensors.*temp' > hw.sensors.cpu0.temp0=30.00 degC > hw.sensors.lm1.temp0=0.00 degC > hw.sensors.lm1.temp1=14.00 degC > hw.sensors.lm1.temp2=14.00 degC > # sysctl -a|grep 'sensors.*temp' > hw.sensors.cpu0.temp0=30.00 degC > hw.sensors.lm1.temp0=0.00 degC > hw.sensors.lm1.temp1=14.00 degC > hw.sensors.lm1.temp2=14.00 degC > # reboot > > BEEEP! > > again, not a very scientific/exacting approach, but half a dozen times i've > seen the same results. i don't know what it is that trips up the sensors, > but that's when i seem to have the issue. > > now, this is running the 5.4 installation (i downgraded at someone's > suggestion for testing) and i can easily reinstall from current snapshot to > see if this may be an unrelated bug. > > but until then, does this scenario make sense to anyone? i now have a fresh install of current/amd64. the snapshot appears to be a bit on the broken side, as some of the libcrypto/libssl stuff is missing (this on both i386 and amd64 snapshots) and this prevents me from logging in via ssh and copy/pasting from terminal. i see this has been reported on the list already, so a newer snapshot (tomorrow?) may fix this. but still i have several -current boots under my belt, and the sysctl temperature thing appears to be similar to what it was with the 5.4 installation. the lm temps are showing negative numbers instead of 0 and 14, but once that happens the box fails to reboot properly as before. one other thing i've noticed now that i've reinstalled so many times in the past few days: it does not matter how long i've been booted into the ramdisk kernel for installation or whatever - it can sit for hours, and always reboots properly. no support for sensors in the ramdisk kernel. coincidence?
Re: requesting help working around boot failures with supermicro atom board
Mark Kettenis xs4all.nl> writes: > > > # sysctl -a|grep 'sensors.*temp' > > hw.sensors.cpu0.temp0=30.00 degC > > hw.sensors.lm1.temp0=0.00 degC > > hw.sensors.lm1.temp1=14.00 degC > > hw.sensors.lm1.temp2=14.00 degC > > # reboot > > > > BEEEP! > > Oh that is interesting. Can you try disabling the lm(4) driver in > your kernel? You can do: > > # config -ef /bsd > ... > ukc> disable lm > 254 lm0 disabled > 255 lm* disabled > 256 lm* disabled > ukc> quit > Saving modified kernel. > # reboot > > That reboot will probably still hang. But it'd be interesting to see > if any subsequent reboots work better. *this* interests me, and was basically what i was asking in the original post - except i had no idea what might need to be disabled. one step at a time, it's been interesting the things that have popped up. still no idea whether this has anything to do with the seemingly openbsd-only issue, but ... i made this change, booted the new kernel, ran 'cksum /dev/mem' a bunch of times in hopes of raising the temperature somewhat (did get to 36C, which is higher than in my previous tests). then i rebooted, and the box came back up without incident. so i'm going to run through this several times with reboots in every 20 minutes or so and see if it survives the night.
Re: requesting help working around boot failures with supermicro atom board
Kurt Mosiejczuk [kurt-open...@se.rit.edu] wrote: > > Hearing that Linux doesn't trip it, I'm wondering if it's an ACPI difference > between OpenBSD and Linux. Perhaps OpenBSD runs the CPU hotter before > turning it back over to the BIOS on reboot? > OpenBSD 5.8-current enters deeper C states than the ACPI describes. The ACPI documentation is not to be taken literally, according to Intel. So now OpenBSD's behavior should be similar to Linux in this regard, and therefore, you'll see lower CPU temperatures. (With the C states and mwait features that are enabled today, your temps are already lower than previous releases.)
Re: requesting help working around boot failures with supermicro atom board
On Mon, Sep 14, 2015 at 05:15:01PM +, Dewey Hylton wrote: > > I've had this issue with the same systems. Never guessed it would > > be OpenBSD specific. What I've found to make it stop happening is > > pulling the board out and redoing the thermal paste for the CPU > > heatsink. I had found some reference indicating that the alarm I > > got might be because of overheating. > > The difference between the boxes may be the attention to detail the > > factory worker who put it together had that day. > > Hearing that Linux doesn't trip it, I'm wondering if it's an ACPI > > difference between OpenBSD and Linux. Perhaps OpenBSD runs the CPU > > hotter before turning it back over to the BIOS on reboot? > this is great information; thanks. any idea where the temperature > reference can be found? I don't remember, it was at least a couple years ago. It was only one reference too. Most talked about listening to beep codes, which this wasn't really beep codes... > i may be able to log the cpu temperature in both operating systems in > order to compare ... Possibly, but noticed I said "before turning it back over to the BIOS". If it's a difference in OS shutdown, it will be difficult to log the temperature. --Kurt
Re: requesting help working around boot failures with supermicro atom board
On Fri, Sep 11, 2015 at 06:38:23PM -0400, dewey.hyl...@gmail.com wrote: > hi all. i???m having difficulty with this board: > > Supermicro X7SPE-HD-D525 rev1 > > i have several similar systems, each running an older version of OpenBSD for > a few years without incident. except this one ??? > > running OpenBSD 5.7 i386, from cold start it boots just fine and runs until > rebooted. once rebooted, however, prior to anything being displayed (i assume > this is early in the bios post phase) i get one very long beep. super micro > tells me this indicates inability to correctly initialize the memory. okay, > so i???ve changed memory for known working components and have the same > issue. at this point, the only thing that gets me booting again is to remove > power and then restore power. it then boots fine from cold start, and fails > on the next reboot (as in, ???reboot??? from the command line). once in > long-beep failure mode, neither the hardware reset button nor the power > button can make the machine boot again. the only thing that works is removing > power. every once in a while it will reboot successfully, only to fail in the > same manner on the next attempt. > > super micro has had me flash bios, clear cmos, boot from different devices > and with nothing connected, etc. the results are the same: when rebooting > from openbsd, next boot fails until power is removed/restored. super micro > blames openbsd. > > i installed linux (same hardware, overwrite openbsd 5.7) and scheduled a > reboot every 5 minutes and left it overnight. i logged 554 successful reboots. > > i have since installed the latest available openbsd amd64 snapshot, and am > seeing the same failures. > > i???m wondering if something could be disabled (boot -c ?) or if something > else raises a red flag and might have a workaround. this has me stumped. i > would very much appreciate a clue stick. > > dmesg follows: > acpidump please. > OpenBSD 5.8-current (GENERIC.MP) #1364: Wed Sep 9 17:32:01 MDT 2015 > dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP > real mem = 4277665792 (4079MB) > avail mem = 4144070656 (3952MB) > mpath0 at root > scsibus0 at mpath0: 256 targets > mainbus0 at root > bios0 at mainbus0 > acpi0 at bios0: rev 2 > acpi0: sleep states S0 S1 S4 S5 > acpi0: tables DSDT FACP APIC MCFG OEMB HPET EINJ BERT ERST HEST > acpi0: wakeup devices P0P1(S4) PS2K(S4) PS2M(S4) USB0(S4) USB1(S4) USB2(S4) > USB5(S4) EUSB(S4) USB3(S4) USB4(S4) USB6(S4) USBE(S4) P0P4(S4) P0P5(S4) > P0P6(S4) P0P7(S4) [...] > acpitimer0 at acpi0: 3579545 Hz, 24 bits > acpimadt0 at acpi0 addr 0xfee0: PC-AT compat > cpu0 at mainbus0: apid 0 (boot processor) > cpu0: Intel(R) Atom(TM) CPU D525 @ 1.80GHz, 1800.23 MHz > cpu0: > FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,DTES64,MWAIT,DS-CPL,TM2,SSSE3,CX16,xTPR,PDCM,MOVBE,NXE,LONG,LAHF,PERF,SENSOR > cpu0: 512KB 64b/line 8-way L2 cache > cpu0: smt 0, core 0, package 0 > mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges > cpu0: apic clock running at 199MHz > cpu0: mwait min=64, max=64, C-substates=0.1, IBE > cpu1 at mainbus0: apid 2 (application processor) > cpu1: Intel(R) Atom(TM) CPU D525 @ 1.80GHz, 1800.00 MHz > cpu1: > FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,DTES64,MWAIT,DS-CPL,TM2,SSSE3,CX16,xTPR,PDCM,MOVBE,NXE,LONG,LAHF,PERF,SENSOR > cpu1: 512KB 64b/line 8-way L2 cache > cpu1: smt 0, core 1, package 0 > cpu2 at mainbus0: apid 1 (application processor) > cpu2: Intel(R) Atom(TM) CPU D525 @ 1.80GHz, 1800.00 MHz > cpu2: > FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,DTES64,MWAIT,DS-CPL,TM2,SSSE3,CX16,xTPR,PDCM,MOVBE,NXE,LONG,LAHF,PERF,SENSOR > cpu2: 512KB 64b/line 8-way L2 cache > cpu2: smt 1, core 0, package 0 > cpu3 at mainbus0: apid 3 (application processor) > cpu3: Intel(R) Atom(TM) CPU D525 @ 1.80GHz, 1800.00 MHz > cpu3: > FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,DTES64,MWAIT,DS-CPL,TM2,SSSE3,CX16,xTPR,PDCM,MOVBE,NXE,LONG,LAHF,PERF,SENSOR > cpu3: 512KB 64b/line 8-way L2 cache > cpu3: smt 1, core 1, package 0 > ioapic0 at mainbus0: apid 4 pa 0xfec0, version 20, 24 pins > ioapic0: misconfigured as apic 1, remapped to apid 4 > acpimcfg0 at acpi0 addr 0xe000, bus 0-255 > acpihpet0 at acpi0: 14318179 Hz > acpiprt0 at acpi0: bus 0 (PCI0) > acpiprt1 at acpi0: bus 4 (P0P1) > acpiprt2 at acpi0: bus 1 (P0P4) > acpiprt3 at acpi0: bus 2 (P0P8) > acpiprt4 at acpi0: bus 3 (P0P9) > acpicpu0 at acpi0: C1(@1 halt!) > acpicpu1 at acpi0: C1(@1 halt!) > acpicpu2 at acpi0: C1(@1 halt!) > acpicpu3 at acpi0: C1(@1 halt!) > acpibtn0 at acpi0: SLPB > acpibtn1 at acpi0: PWRB > pci0 at mainbus0 bus 0 > pchb0 at pci0 dev 0 function 0
Re: requesting help working around boot failures with supermicro atom board
On Sun, Sep 13, 2015 at 10:15 AM, Sonicwrote: > I also have this issue with OpenBSD on this box. Every time I reboot > after updating a snapshot I need to power cycle to eliminate the long > beep error. For some reason I kept thinking it was due to my replacing > the stock PSU with a picoPSU for silent operation as a BIOS upgrade > did not solve the issue. Never had this problem with the previous > generation D510 based systems, only this D525 based version. My mistake - the board I have trouble with is the X7SPE-HF-D525 and not the X7SPA-HF-D525.
Re: requesting help working around boot failures with supermicro atom board
My mistake - the board I have trouble with is the X7SPE-HF-D525 and not the X7SPA-HF-D525. On Sun, Sep 13, 2015 at 11:23 AM, Sonicwrote: > On Sun, Sep 13, 2015 at 10:15 AM, Sonic wrote: >> I also have this issue with OpenBSD on this box. Every time I reboot >> after updating a snapshot I need to power cycle to eliminate the long >> beep error. For some reason I kept thinking it was due to my replacing >> the stock PSU with a picoPSU for silent operation as a BIOS upgrade >> did not solve the issue. Never had this problem with the previous >> generation D510 based systems, only this D525 based version. > > My mistake - the board I have trouble with is the X7SPE-HF-D525 and > not the X7SPA-HF-D525.
Re: requesting help working around boot failures with supermicro atom board
Never had any reboot issues on two X7SPA-HF-D525 (Bios R1.2b) and i'm updating/rebooting pretty often the last few weeks. On Sun, Sep 13, 2015 at 10:15:31AM -0400, Sonic wrote: > On Sat, Sep 12, 2015 at 11:02 PM,wrote: > > X7SPA-HF-D525 > > I also have this issue with OpenBSD on this box. Every time I reboot > after updating a snapshot I need to power cycle to eliminate the long > beep error. For some reason I kept thinking it was due to my replacing > the stock PSU with a picoPSU for silent operation as a BIOS upgrade > did not solve the issue. Never had this problem with the previous > generation D510 based systems, only this D525 based version. > > CHris > -- Mark Patruck ( mark at wrapped.cx ) GPG key 0xF2865E51 / 187F F6D3 EE04 1DCE 1C74 F644 0D3C F66F F286 5E51 http://www.wrapped.cx
Re: requesting help working around boot failures with supermicro atom board
On Sat, Sep 12, 2015 at 11:02 PM,wrote: > X7SPA-HF-D525 I also have this issue with OpenBSD on this box. Every time I reboot after updating a snapshot I need to power cycle to eliminate the long beep error. For some reason I kept thinking it was due to my replacing the stock PSU with a picoPSU for silent operation as a BIOS upgrade did not solve the issue. Never had this problem with the previous generation D510 based systems, only this D525 based version. CHris
Re: requesting help working around boot failures with supermicro atom board
Any thermal settings in the bios? CPU performance, Fan Speed etc.. Does the fan idle correctly? Often intel chipsets will throttle the fan during a bios test. Perhaps ACPI is not routing an interrupt?? Regards Patrick > On Sep 11, 2015, at 5:38 PM, dewey.hyl...@gmail.com wrote: > > hi all. i’m having difficulty with this board: > > Supermicro X7SPE-HD-D525 rev1 > > i have several similar systems, each running an older version of OpenBSD for > a few years without incident. except this one … > > running OpenBSD 5.7 i386, from cold start it boots just fine and runs until > rebooted. once rebooted, however, prior to anything being displayed (i assume > this is early in the bios post phase) i get one very long beep. super micro > tells me this indicates inability to correctly initialize the memory. okay, > so i’ve changed memory for known working components and have the same issue. > at this point, the only thing that gets me booting again is to remove power > and then restore power. it then boots fine from cold start, and fails on the > next reboot (as in, “reboot” from the command line). once in long-beep > failure mode, neither the hardware reset button nor the power button can make > the machine boot again. the only thing that works is removing power. every > once in a while it will reboot successfully, only to fail in the same manner > on the next attempt. > > super micro has had me flash bios, clear cmos, boot from different devices > and with nothing connected, etc. the results are the same: when rebooting > from openbsd, next boot fails until power is removed/restored. super micro > blames openbsd. > > i installed linux (same hardware, overwrite openbsd 5.7) and scheduled a > reboot every 5 minutes and left it overnight. i logged 554 successful reboots. > > i have since installed the latest available openbsd amd64 snapshot, and am > seeing the same failures. > > i’m wondering if something could be disabled (boot -c ?) or if something else > raises a red flag and might have a workaround. this has me stumped. i would > very much appreciate a clue stick. > > dmesg follows: > > OpenBSD 5.8-current (GENERIC.MP) #1364: Wed Sep 9 17:32:01 MDT 2015 >dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP > real mem = 4277665792 (4079MB) > avail mem = 4144070656 (3952MB) > mpath0 at root > scsibus0 at mpath0: 256 targets > mainbus0 at root > bios0 at mainbus0 > acpi0 at bios0: rev 2 > acpi0: sleep states S0 S1 S4 S5 > acpi0: tables DSDT FACP APIC MCFG OEMB HPET EINJ BERT ERST HEST > acpi0: wakeup devices P0P1(S4) PS2K(S4) PS2M(S4) USB0(S4) USB1(S4) USB2(S4) > USB5(S4) EUSB(S4) USB3(S4) USB4(S4) USB6(S4) USBE(S4) P0P4(S4) P0P5(S4) > P0P6(S4) P0P7(S4) [...] > acpitimer0 at acpi0: 3579545 Hz, 24 bits > acpimadt0 at acpi0 addr 0xfee0: PC-AT compat > cpu0 at mainbus0: apid 0 (boot processor) > cpu0: Intel(R) Atom(TM) CPU D525 @ 1.80GHz, 1800.23 MHz > cpu0: > FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,DTES64,MWAIT,DS-CPL,TM2,SSSE3,CX16,xTPR,PDCM,MOVBE,NXE,LONG,LAHF,PERF,SENSOR > cpu0: 512KB 64b/line 8-way L2 cache > cpu0: smt 0, core 0, package 0 > mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges > cpu0: apic clock running at 199MHz > cpu0: mwait min=64, max=64, C-substates=0.1, IBE > cpu1 at mainbus0: apid 2 (application processor) > cpu1: Intel(R) Atom(TM) CPU D525 @ 1.80GHz, 1800.00 MHz > cpu1: > FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,DTES64,MWAIT,DS-CPL,TM2,SSSE3,CX16,xTPR,PDCM,MOVBE,NXE,LONG,LAHF,PERF,SENSOR > cpu1: 512KB 64b/line 8-way L2 cache > cpu1: smt 0, core 1, package 0 > cpu2 at mainbus0: apid 1 (application processor) > cpu2: Intel(R) Atom(TM) CPU D525 @ 1.80GHz, 1800.00 MHz > cpu2: > FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,DTES64,MWAIT,DS-CPL,TM2,SSSE3,CX16,xTPR,PDCM,MOVBE,NXE,LONG,LAHF,PERF,SENSOR > cpu2: 512KB 64b/line 8-way L2 cache > cpu2: smt 1, core 0, package 0 > cpu3 at mainbus0: apid 3 (application processor) > cpu3: Intel(R) Atom(TM) CPU D525 @ 1.80GHz, 1800.00 MHz > cpu3: > FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,DTES64,MWAIT,DS-CPL,TM2,SSSE3,CX16,xTPR,PDCM,MOVBE,NXE,LONG,LAHF,PERF,SENSOR > cpu3: 512KB 64b/line 8-way L2 cache > cpu3: smt 1, core 1, package 0 > ioapic0 at mainbus0: apid 4 pa 0xfec0, version 20, 24 pins > ioapic0: misconfigured as apic 1, remapped to apid 4 > acpimcfg0 at acpi0 addr 0xe000, bus 0-255 > acpihpet0 at acpi0: 14318179 Hz > acpiprt0 at acpi0: bus 0 (PCI0) > acpiprt1 at acpi0: bus 4 (P0P1) > acpiprt2 at acpi0: bus 1 (P0P4) > acpiprt3 at acpi0: bus 2 (P0P8) > acpiprt4 at acpi0: bus 3 (P0P9) > acpicpu0 at acpi0: C1(@1 halt!) > acpicpu1 at acpi0: C1(@1
Re: requesting help working around boot failures with supermicro atom board
> i have indeed disabled quick/quiet boot options to no avail. i've also tried > failsafe mode, ide vs ahci, acpi v1/2/3. the issue does not present with > linux, which makes me wonder whether the openbsd kernel is somehow making > some kind of hardware setting change that is not cleared on reboot. despite > this only presenting in openbsd, i still blame hardware - but am hoping > there might be some openbsd-related tweak. > > thanks for the idea. > > Hi Dewey, On my X7SPA-HF-D525 system quick boot has never been enabled, and it's set to AHCI mode and it never mattered the storage device, i.e. independent of USB flash stick or HDD disk (been running both for long periods), the problem has been manifesting itself over the years. What I have to make clear is that not every reboot leads to this condition here, only if the system has been running for a considerable longer time. Typically I run the system for a while (as long as possible / needed) between snapshot upgrades as it's in use 24/7 behind a true sine wave UPS. I have ruled out power supply, memory and there is no periphery. After the system has been running for while I usually download the sets from a mirror and rsync them to local storage, then issue a reboot. There is a pretty high chance the system will NOT boot at all as you're reporting exactly, but it does go into the reboot process OK cleanly exiting the OS and doing a reset. It goes into the early stages of the POST and can not complete it, but the system passes is accessible over the IPMI BMC and can be power cycled etc over IPMI over LAN util, and via the web based interface on the BMC as well. The system can not boot up properly once it enters this condition, since on an IPMI power cycle or off the POST goes into long beep (~5-7s) silend (~1s) repeat long beep / silence pattern that means memory error, but it's not the memory's fault. The IPMI can not be used to reset the system, only to power off/on or power cycle in this condition. My most critical presumption is that it is a BIOS POST or an IPMI hook related to the BIOS post however, and would like that further taken with Supermicro if the OS factor is ruled out as well. The system can only be brought back by the PSU breaker switch or power (mains) cable disconnect / reconnect for 5s. Once up the system boots, passes through the upgrade OK, can be rebooted and the problem IS NOT present. Several reboots work OK, tested, so it's not caused by the OS unclean exit, it works several reboot cycles / upgrades etc... until you leave it running for a longer period of time. This is what you may be seeing with the Linux reboot cycle test script. The system runs for a long while no issues, and after getting it rebooted no matter how, over SSH, local KBD, serial cable, or serial over LAN (Ethernet) IPMI tool, or the IPMI web based tool, it gets into this flawed state where it can not pass the POST. So, the system is a total fail for locating at a data centre without a PDU unit with real disconnect feature. I have never ran Linux on this box and can not do so (live system in production, no spares or budget for this), but I would recommend that you try and see if it makes a difference over a longer run with Linux and see if you can trigger this happening independent of the OS. The most important issue for me is to know if it is OS dependent or not, as this will be very valuable in bringing it back to Supermicro, or alternatively comparing the reboot state between OpenBSD and another OS. Thank you for your tests and perseverance on this, much appreciated. Regards, Anton
Re: requesting help working around boot failures with supermicro atom board
Sonic gmail.com> writes: > On Sun, Sep 13, 2015 at 10:15 AM, Sonic gmail.com> wrote: > > I also have this issue with OpenBSD on this box. Every time I reboot > > after updating a snapshot I need to power cycle to eliminate the long > > beep error. For some reason I kept thinking it was due to my replacing > > the stock PSU with a picoPSU for silent operation as a BIOS upgrade > > did not solve the issue. Never had this problem with the previous > > generation D510 based systems, only this D525 based version. > > My mistake - the board I have trouble with is the X7SPE-HF-D525 and > not the X7SPA-HF-D525. this is the same board i have (X7SPE-HF-D525). what board and bios revision do you have? my board is rev 1.0 and bios is 1.2b.
Re: requesting help working around boot failures with supermicro atom board
> Whether they are identical or not, showing us a dmesg diff with a known > working release booted from both a working and the non-working system > could also be helpful. Another Supermicro X7SPA-HF-D525 board (same chipset/CPU combination) has been having the same issue since early 2011 (the entire life span of the system), always running a recent snapshot: http://www.supermicro.com/products/motherboard/ATOM/ICH9/X7SPA-HF-D525.cfm There is absolutely no sense in running a 2010 snapshot now, except for experiments as suggested by Benny. Please try and see if it makes a difference comparing reboots of the system over SSH, on the local console, over serial console, and with serial over LAN. I can provide test results over all these methods given some tech suggestion / solution. You may want to have a DMI / PCI / ACPI dumps, let me know how / if you want these from my system too (attach brief newbie instructions please). So far, nothing has solved it yet for me too, except power cycle via the PSU breaker each time this happens. Never tried any other OS except OpenBSD, thought it was the hardware (memory by the beep code) fault, but it's not (confirmed with long runs of memtest). The system runs for very long intervals without any other issues, except the reboot behaviour in the original post, confirming same problem. Running latest BIOS and IPMI firmwares. Thanks, Dewey for testing this more extensively than I had the nerve to.
Re: requesting help working around boot failures with supermicro atom board
Richard Laysell xiphosura.co.uk> writes: > > On Fri, 11 Sep 2015 18:38:23 -0400 (EDT) > "dewey.hylton gmail.com" gmail.com> wrote: > > > hi all. i’m having difficulty with this board: > > > > Supermicro X7SPE-HD-D525 rev1 > > > > i have several similar systems, each running an older version of > > OpenBSD for a few years without incident. except this one … > > > > > Do you have Quick Boot enabled in the BIOS? If so, try disabling it. > > I have known this cause problems (on other boards - no experience with > this one). Quick Boot seems to do a quick and dirty setup and > doesn't fully initialise all of the devices. This may be why you are > seeing it boot OK if you remove the power - the devices then either get > reset to their default states or the BIOS has to set them up. > > Regards, > > Richard i have indeed disabled quick/quiet boot options to no avail. i've also tried failsafe mode, ide vs ahci, acpi v1/2/3. the issue does not present with linux, which makes me wonder whether the openbsd kernel is somehow making some kind of hardware setting change that is not cleared on reboot. despite this only presenting in openbsd, i still blame hardware - but am hoping there might be some openbsd-related tweak. thanks for the idea.
Re: requesting help working around boot failures with supermicro atom board
wrant.com> writes: > > > Whether they are identical or not, showing us a dmesg diff with a known > > working release booted from both a working and the non-working system > > could also be helpful. > > Another Supermicro X7SPA-HF-D525 board (same chipset/CPU combination) > has been having the same issue since early 2011 (the entire life span > of the system), always running a recent snapshot: > > http://www.supermicro.com/products/motherboard/ATOM/ICH9/X7SPA-HF-D525.cfm > > There is absolutely no sense in running a 2010 snapshot now, except for > experiments as suggested by Benny. > > Please try and see if it makes a difference comparing reboots of the > system over SSH, on the local console, over serial console, and with > serial over LAN. I can provide test results over all these methods > given some tech suggestion / solution. > > You may want to have a DMI / PCI / ACPI dumps, let me know how / if you > want these from my system too (attach brief newbie instructions please). > > So far, nothing has solved it yet for me too, except power cycle via the > PSU breaker each time this happens. Never tried any other OS except > OpenBSD, thought it was the hardware (memory by the beep code) fault, > but it's not (confirmed with long runs of memtest). The system > runs for very long intervals without any other issues, except the > reboot behaviour in the original post, confirming same problem. Running > latest BIOS and IPMI firmwares. > > Thanks, Dewey for testing this more extensively than I had the nerve to. this is great information, and i've passed it along to supermicro as a "i'm not the only one with this issue" datapoint. since i'm apparently not alone, this looks more like a board/firmware design issue - just curious why none of my other boards have this issue. i'm not knowledgeable enough to provide newbie instructions on the dumps, but if doing this makes sense to anyone here with more hardware experience i'm certainly willing to try it. regarding the comparison of reboots, do you mean testing reboots initiated via different means (ssh/console/etc.)? if so, i've done that and ruled them out. last test was a shell script with sleep/reboot commands which executed via rc.local - meaning there was no user login prior to the failure. thanks for this information.
Re: requesting help working around boot failures with supermicro atom board
On Fri, 11 Sep 2015 18:38:23 -0400 (EDT) "dewey.hyl...@gmail.com"wrote: > hi all. i’m having difficulty with this board: > > Supermicro X7SPE-HD-D525 rev1 > > i have several similar systems, each running an older version of > OpenBSD for a few years without incident. except this one … > Do you have Quick Boot enabled in the BIOS? If so, try disabling it. I have known this cause problems (on other boards - no experience with this one). Quick Boot seems to do a quick and dirty setup and doesn't fully initialise all of the devices. This may be why you are seeing it boot OK if you remove the power - the devices then either get reset to their default states or the BIOS has to set them up. Regards, Richard
Re: requesting help working around boot failures with supermicro atom board
Benny Lofgren lofgren.biz> writes: > > Hi Dewey, > > On 2015-09-12 00:38, dewey.hylton gmail.com wrote: > > hi all. i’m having difficulty with this board: > > I noticed your mail somehow got posted twice, but I'm commenting on the > first incarnation of it because the second had some characters like '\'' > mangled (UTF-8 copy/paste issue I presume). i posted first via my normal zimbra server, which seemed to have gotten hung up so i copied/pasted into the gmail web interface. i think the second to show up may have been the original (via zimbra). no idea why there's an issue. i only get daily digests via email so i'm posting this via the gmane interface; if this doesn't work correctly i'll have to sit down and figure out the best way to do this in the future. > > > Supermicro X7SPE-HD-D525 rev1 > > i have several similar systems, each running an older version of OpenBSD for a few years without incident. > except this one … > > You might already have tried this, but providing this information may > give important clues to the rest of us trying to help you: > > Since you say that your other similar systems are successfully running > older versions of OpenBSD, have you tried running this new system with a > version that you know works on the other boards? i just installed 5.4 on this board, just as is running on its original cluster mate (which is still running fine). same issue. > > And if so, then have you tried moving on to subsequent versions in turn > until you find the one which breaks? That is a really important piece of > information. > > Also, are those other systems "similar" or "identical"? If not > identical, what differs? This is also important to get a grip on the > problem. these are identical in all ways, except for the current bios version (which supermicro had me to update when troubleshooting). i'll attempt to back it down to the same version present on the working board and see where that gets me. > > Whether they are identical or not, showing us a dmesg diff with a known > working release booted from both a working and the non-working system > could also be helpful. i'll post the diff below. > > Regards, > > /Benny thanks for your input. i was shocked to find that linux didn't produce this issue as well; i don't understand how a failure for a board to post could have anything to do with an os which is not running during the post. that may just show how much i (don't) understand the hardware and bios side of things.
Re: requesting help working around boot failures with supermicro atom board
John E.P. Hynes hytronix.com> writes: > > Try booting the SP kernel and see if that works. If it does, you might > be running into a variant of an issie I've had on my SuperMicro boxen... > > -John john, i tried this (5.4 bsd.sp) and i'm seeing the same result. it didn't occur to me to try this; thanks for the idea.
Re: requesting help working around boot failures with supermicro atom board
Dewey Hylton gmail.com> writes: > > Whether they are identical or not, showing us a dmesg diff with a known > > working release booted from both a working and the non-working system > > could also be helpful. > > i'll post the diff below. the only real differences i see are: 1) bios revision 2) secondary disk attached to different sata port 3) sensors only present on working machine i'm having a hard time finding the older bios on the supermicro site, so i'm reaching out to them. i'm ignoring the disk channel difference. i'm looking into the sensors - haven't found that in the bios yet. here's the diff: $ diff inf1.dmesg.54i inf2.dmesg.54i.good 8,9c8,9 < bios0 at mainbus0: AT/286+ BIOS, date 07/19/13, BIOS32 rev. 0 @ 0xf0010, SMBIOS rev. 2.6 @ 0x9ac00 (19 entries) < bios0: vendor American Megatrends Inc. version "1.2b" date 07/19/13 --- > bios0 at mainbus0: AT/286+ BIOS, date 02/21/12, BIOS32 rev. 0 @ 0xf0010, SMBIOS rev. 2.6 @ 0x9ac00 (19 entries) > bios0: vendor American Megatrends Inc. version "1.2a" date 02/21/12 18c18 < cpu0: apic clock running at 200MHz --- > cpu0: apic clock running at 199MHz 20c20 < cpu1: Intel(R) Atom(TM) CPU D525 @ 1.80GHz ("GenuineIntel" 686-class) 1.81 GHz --- > cpu1: Intel(R) Atom(TM) CPU D525 @ 1.80GHz ("GenuineIntel" 686-class) 1.80 GHz 23c23 < cpu2: Intel(R) Atom(TM) CPU D525 @ 1.80GHz ("GenuineIntel" 686-class) 1.81 GHz --- > cpu2: Intel(R) Atom(TM) CPU D525 @ 1.80GHz ("GenuineIntel" 686-class) 1.80 GHz 26c26 < cpu3: Intel(R) Atom(TM) CPU D525 @ 1.80GHz ("GenuineIntel" 686-class) 1.81 GHz --- > cpu3: Intel(R) Atom(TM) CPU D525 @ 1.80GHz ("GenuineIntel" 686-class) 1.80 GHz 43c43 < bios0: ROM list: 0xc/0x8000 0xc8000/0x1000 --- > bios0: ROM list: 0xc/0x8000 57c57 < em0 at pci2 dev 0 function 0 "Intel 82574L" rev 0x00: msi, address 0c:c4:7a:54:90:8e --- > em0 at pci2 dev 0 function 0 "Intel 82574L" rev 0x00: msi, address 00:25:90:97:49:e0 60c60 < em1 at pci3 dev 0 function 0 "Intel 82574L" rev 0x00: msi, address 0c:c4:7a:54:90:8f --- > em1 at pci3 dev 0 function 0 "Intel 82574L" rev 0x00: msi, address 00:25:90:97:49:e1 75c75 < sd0 at scsibus0 targ 0 lun 0:SCSI3 0/direct fixed naa.500a07510920882c --- > sd0 at scsibus0 targ 0 lun 0: SCSI3 0/direct fixed naa.500a075109208807 77c77 < sd1 at scsibus0 targ 2 lun 0: SCSI3 0/direct fixed naa.50025388500930cc --- > sd1 at scsibus0 targ 5 lun 0: SCSI3 0/direct fixed naa.50025388a01274c6 107a108 > lm2 at wbsio0 port 0xca0/8: W83627DHG 109a111 > lm1: disabling sensors due to alias with lm2 118a121,138 > uhub8 at uhub5 port 1 "ATEN International product 0x8021" rev 1.10/1.00 addr 2 > uhidev2 at uhub8 port 1 configuration 1 interface 0 "ATEN International Co. Ltd GCS1808 V3.2.313" rev 1.10/1.00 addr 3 > uhidev2: iclass 3/1 > ukbd1 at uhidev2: 8 variable keys, 6 key codes > wskbd2 at ukbd1 mux 1 > wskbd2: connecting to wsdisplay0 > uhidev3 at uhub8 port 1 configuration 1 interface 1 "ATEN International Co. Ltd GCS1808 V3.2.313" rev 1.10/1.00 addr 3 > uhidev3: iclass 3/1, 2 report ids > uhid0 at uhidev3 reportid 1: input=2, output=0, feature=0 > uhid1 at uhidev3 reportid 2: input=1, output=0, feature=0 > uhidev4 at uhub8 port 1 configuration 1 interface 2 "ATEN International Co. Ltd GCS1808 V3.2.313" rev 1.10/1.00 addr 3 > uhidev4: iclass 3/1 > ums1 at uhidev4: 5 buttons, Z dir > wsmouse1 at ums1 mux 0 > uhidev5 at uhub8 port 1 configuration 1 interface 3 "ATEN International Co. Ltd GCS1808 V3.2.313" rev 1.10/1.00 addr 3 > uhidev5: iclass 3/1 > ums2 at uhidev5: 3 buttons, Z dir > wsmouse2 at ums2 mux 0 123c143,147 < root on sd0a (22cb25880c08c19f.a) swap on sd0b dump on sd0b --- > root on sd0a (3a0229e574e4bfd1.a) swap on sd0b dump on sd0b
Re: requesting help working around boot failures with supermicro atom board
Hi Dewey, On 2015-09-12 00:38, dewey.hyl...@gmail.com wrote: > hi all. i’m having difficulty with this board: I noticed your mail somehow got posted twice, but I'm commenting on the first incarnation of it because the second had some characters like '\'' mangled (UTF-8 copy/paste issue I presume). > Supermicro X7SPE-HD-D525 rev1 > i have several similar systems, each running an older version of OpenBSD for > a few years without incident. except this one … You might already have tried this, but providing this information may give important clues to the rest of us trying to help you: Since you say that your other similar systems are successfully running older versions of OpenBSD, have you tried running this new system with a version that you know works on the other boards? And if so, then have you tried moving on to subsequent versions in turn until you find the one which breaks? That is a really important piece of information. Also, are those other systems "similar" or "identical"? If not identical, what differs? This is also important to get a grip on the problem. Whether they are identical or not, showing us a dmesg diff with a known working release booted from both a working and the non-working system could also be helpful. Regards, /Benny > > running OpenBSD 5.7 i386, from cold start it boots just fine and runs until > rebooted. once rebooted, however, prior to anything being displayed (i assume > this is early in the bios post phase) i get one very long beep. super micro > tells me this indicates inability to correctly initialize the memory. okay, > so i’ve changed memory for known working components and have the same issue. > at this point, the only thing that gets me booting again is to remove power > and then restore power. it then boots fine from cold start, and fails on the > next reboot (as in, “reboot” from the command line). once in long-beep > failure mode, neither the hardware reset button nor the power button can make > the machine boot again. the only thing that works is removing power. every > once in a while it will reboot successfully, only to fail in the same manner > on the next attempt. > > super micro has had me flash bios, clear cmos, boot from different devices > and with nothing connected, etc. the results are the same: when rebooting > from openbsd, next boot fails until power is removed/restored. super micro > blames openbsd. > > i installed linux (same hardware, overwrite openbsd 5.7) and scheduled a > reboot every 5 minutes and left it overnight. i logged 554 successful reboots. > > i have since installed the latest available openbsd amd64 snapshot, and am > seeing the same failures. > > i’m wondering if something could be disabled (boot -c ?) or if something else > raises a red flag and might have a workaround. this has me stumped. i would > very much appreciate a clue stick. > > dmesg follows: > > OpenBSD 5.8-current (GENERIC.MP) #1364: Wed Sep 9 17:32:01 MDT 2015 > dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP > real mem = 4277665792 (4079MB) > avail mem = 4144070656 (3952MB) > mpath0 at root > scsibus0 at mpath0: 256 targets > mainbus0 at root > bios0 at mainbus0 > acpi0 at bios0: rev 2 > acpi0: sleep states S0 S1 S4 S5 > acpi0: tables DSDT FACP APIC MCFG OEMB HPET EINJ BERT ERST HEST > acpi0: wakeup devices P0P1(S4) PS2K(S4) PS2M(S4) USB0(S4) USB1(S4) USB2(S4) > USB5(S4) EUSB(S4) USB3(S4) USB4(S4) USB6(S4) USBE(S4) P0P4(S4) P0P5(S4) > P0P6(S4) P0P7(S4) [...] > acpitimer0 at acpi0: 3579545 Hz, 24 bits > acpimadt0 at acpi0 addr 0xfee0: PC-AT compat > cpu0 at mainbus0: apid 0 (boot processor) > cpu0: Intel(R) Atom(TM) CPU D525 @ 1.80GHz, 1800.23 MHz > cpu0: > FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,DTES64,MWAIT,DS-CPL,TM2,SSSE3,CX16,xTPR,PDCM,MOVBE,NXE,LONG,LAHF,PERF,SENSOR > cpu0: 512KB 64b/line 8-way L2 cache > cpu0: smt 0, core 0, package 0 > mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges > cpu0: apic clock running at 199MHz > cpu0: mwait min=64, max=64, C-substates=0.1, IBE > cpu1 at mainbus0: apid 2 (application processor) > cpu1: Intel(R) Atom(TM) CPU D525 @ 1.80GHz, 1800.00 MHz > cpu1: > FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,DTES64,MWAIT,DS-CPL,TM2,SSSE3,CX16,xTPR,PDCM,MOVBE,NXE,LONG,LAHF,PERF,SENSOR > cpu1: 512KB 64b/line 8-way L2 cache > cpu1: smt 0, core 1, package 0 > cpu2 at mainbus0: apid 1 (application processor) > cpu2: Intel(R) Atom(TM) CPU D525 @ 1.80GHz, 1800.00 MHz > cpu2: > FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,DTES64,MWAIT,DS-CPL,TM2,SSSE3,CX16,xTPR,PDCM,MOVBE,NXE,LONG,LAHF,PERF,SENSOR > cpu2: 512KB 64b/line 8-way L2 cache > cpu2: smt 1, core 0, package 0 > cpu3 at mainbus0: apid 3 (application processor) > cpu3:
Re: requesting help working around boot failures with supermicro atom board
Try booting the SP kernel and see if that works. If it does, you might be running into a variant of an issie I've had on my SuperMicro boxen... -John On 09/11/2015 06:38 PM, dewey.hyl...@gmail.com wrote: > hi all. i’m having difficulty with this board: > > Supermicro X7SPE-HD-D525 rev1 > > i have several similar systems, each running an older version of OpenBSD for > a few years without incident. except this one … > > running OpenBSD 5.7 i386, from cold start it boots just fine and runs until > rebooted. once rebooted, however, prior to anything being displayed (i assume > this is early in the bios post phase) i get one very long beep. super micro > tells me this indicates inability to correctly initialize the memory. okay, > so i’ve changed memory for known working components and have the same issue. > at this point, the only thing that gets me booting again is to remove power > and then restore power. it then boots fine from cold start, and fails on the > next reboot (as in, “reboot” from the command line). once in long-beep > failure mode, neither the hardware reset button nor the power button can make > the machine boot again. the only thing that works is removing power. every > once in a while it will reboot successfully, only to fail in the same manner > on the next attempt. > > super micro has had me flash bios, clear cmos, boot from different devices > and with nothing connected, etc. the results are the same: when rebooting > from openbsd, next boot fails until power is removed/restored. super micro > blames openbsd. > > i installed linux (same hardware, overwrite openbsd 5.7) and scheduled a > reboot every 5 minutes and left it overnight. i logged 554 successful reboots. > > i have since installed the latest available openbsd amd64 snapshot, and am > seeing the same failures. > > i’m wondering if something could be disabled (boot -c ?) or if something else > raises a red flag and might have a workaround. this has me stumped. i would > very much appreciate a clue stick. > > dmesg follows: > > OpenBSD 5.8-current (GENERIC.MP) #1364: Wed Sep 9 17:32:01 MDT 2015 > dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP > real mem = 4277665792 (4079MB) > avail mem = 4144070656 (3952MB) > mpath0 at root > scsibus0 at mpath0: 256 targets > mainbus0 at root > bios0 at mainbus0 > acpi0 at bios0: rev 2 > acpi0: sleep states S0 S1 S4 S5 > acpi0: tables DSDT FACP APIC MCFG OEMB HPET EINJ BERT ERST HEST > acpi0: wakeup devices P0P1(S4) PS2K(S4) PS2M(S4) USB0(S4) USB1(S4) USB2(S4) > USB5(S4) EUSB(S4) USB3(S4) USB4(S4) USB6(S4) USBE(S4) P0P4(S4) P0P5(S4) > P0P6(S4) P0P7(S4) [...] > acpitimer0 at acpi0: 3579545 Hz, 24 bits > acpimadt0 at acpi0 addr 0xfee0: PC-AT compat > cpu0 at mainbus0: apid 0 (boot processor) > cpu0: Intel(R) Atom(TM) CPU D525 @ 1.80GHz, 1800.23 MHz > cpu0: > FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,DTES64,MWAIT,DS-CPL,TM2,SSSE3,CX16,xTPR,PDCM,MOVBE,NXE,LONG,LAHF,PERF,SENSOR > cpu0: 512KB 64b/line 8-way L2 cache > cpu0: smt 0, core 0, package 0 > mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges > cpu0: apic clock running at 199MHz > cpu0: mwait min=64, max=64, C-substates=0.1, IBE > cpu1 at mainbus0: apid 2 (application processor) > cpu1: Intel(R) Atom(TM) CPU D525 @ 1.80GHz, 1800.00 MHz > cpu1: > FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,DTES64,MWAIT,DS-CPL,TM2,SSSE3,CX16,xTPR,PDCM,MOVBE,NXE,LONG,LAHF,PERF,SENSOR > cpu1: 512KB 64b/line 8-way L2 cache > cpu1: smt 0, core 1, package 0 > cpu2 at mainbus0: apid 1 (application processor) > cpu2: Intel(R) Atom(TM) CPU D525 @ 1.80GHz, 1800.00 MHz > cpu2: > FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,DTES64,MWAIT,DS-CPL,TM2,SSSE3,CX16,xTPR,PDCM,MOVBE,NXE,LONG,LAHF,PERF,SENSOR > cpu2: 512KB 64b/line 8-way L2 cache > cpu2: smt 1, core 0, package 0 > cpu3 at mainbus0: apid 3 (application processor) > cpu3: Intel(R) Atom(TM) CPU D525 @ 1.80GHz, 1800.00 MHz > cpu3: > FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,DTES64,MWAIT,DS-CPL,TM2,SSSE3,CX16,xTPR,PDCM,MOVBE,NXE,LONG,LAHF,PERF,SENSOR > cpu3: 512KB 64b/line 8-way L2 cache > cpu3: smt 1, core 1, package 0 > ioapic0 at mainbus0: apid 4 pa 0xfec0, version 20, 24 pins > ioapic0: misconfigured as apic 1, remapped to apid 4 > acpimcfg0 at acpi0 addr 0xe000, bus 0-255 > acpihpet0 at acpi0: 14318179 Hz > acpiprt0 at acpi0: bus 0 (PCI0) > acpiprt1 at acpi0: bus 4 (P0P1) > acpiprt2 at acpi0: bus 1 (P0P4) > acpiprt3 at acpi0: bus 2 (P0P8) > acpiprt4 at acpi0: bus 3 (P0P9) > acpicpu0 at acpi0: C1(@1 halt!) > acpicpu1 at acpi0: C1(@1 halt!) > acpicpu2 at acpi0: C1(@1 halt!) > acpicpu3 at acpi0: C1(@1 halt!) >
requesting help working around boot failures with supermicro atom board
hi all. iâm having difficulty with OpenBSD on this board: Supermicro X7SPE-HD-D525 rev1 i have several similar systems, each running an older version of OpenBSD for a few years without incident. except this one ⦠running OpenBSD 5.7 i386 as well as latest amd64 snapshot, from cold start it boots just fine and runs until rebooted. once rebooted, however, prior to anything being displayed (i assume this is early in the bios post phase) i get one very long beep. super micro tells me this indicates inability to correctly initialize the memory. okay, so iâve changed memory for known working components and have the same issue. at this point, the only thing that gets me booting again is to remove power and then restore power. it then boots fine from cold start, and fails on the next reboot (as in, ârebootâ from the command line). once in long-beep failure mode, neither the hardware reset button nor the power button can make the machine boot again. the only thing that works is removing power. every once in a while it will reboot successfully, only to fail in the same manner on the next attempt. super micro has had me flash bios, clear cmos, boot from different devices and with nothing connected, etc. the results are the same: when rebooting from openbsd, next boot fails until power is removed/restored. super micro blames openbsd. i installed linux (same hardware, overwrite openbsd 5.7) and scheduled a reboot every 5 minutes and left it overnight. i logged 554 successful reboots. iâm wondering if something could be disabled (boot -c ?) or if something else raises a red flag and might have a workaround. this has me stumped. i would very much appreciate a clue stick. dmesg follows: OpenBSD 5.8-current (GENERIC.MP) #1364: Wed Sep 9 17:32:01 MDT 2015 dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP real mem = 4277665792 (4079MB) avail mem = 4144070656 (3952MB) mpath0 at root scsibus0 at mpath0: 256 targets mainbus0 at root bios0 at mainbus0 acpi0 at bios0: rev 2 acpi0: sleep states S0 S1 S4 S5 acpi0: tables DSDT FACP APIC MCFG OEMB HPET EINJ BERT ERST HEST acpi0: wakeup devices P0P1(S4) PS2K(S4) PS2M(S4) USB0(S4) USB1(S4) USB2(S4) USB5(S4) EUSB(S4) USB3(S4) USB4(S4) USB6(S4) USBE(S4) P0P4(S4) P0P5(S4) P0P6(S4) P0P7(S4) [...] acpitimer0 at acpi0: 3579545 Hz, 24 bits acpimadt0 at acpi0 addr 0xfee0: PC-AT compat cpu0 at mainbus0: apid 0 (boot processor) cpu0: Intel(R) Atom(TM) CPU D525 @ 1.80GHz, 1800.23 MHz cpu0: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUS H,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,DTES64,MWAIT,DS-CPL,TM2,SSSE3, CX16,xTPR,PDCM,MOVBE,NXE,LONG,LAHF,PERF,SENSOR cpu0: 512KB 64b/line 8-way L2 cache cpu0: smt 0, core 0, package 0 mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges cpu0: apic clock running at 199MHz cpu0: mwait min=64, max=64, C-substates=0.1, IBE cpu1 at mainbus0: apid 2 (application processor) cpu1: Intel(R) Atom(TM) CPU D525 @ 1.80GHz, 1800.00 MHz cpu1: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUS H,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,DTES64,MWAIT,DS-CPL,TM2,SSSE3, CX16,xTPR,PDCM,MOVBE,NXE,LONG,LAHF,PERF,SENSOR cpu1: 512KB 64b/line 8-way L2 cache cpu1: smt 0, core 1, package 0 cpu2 at mainbus0: apid 1 (application processor) cpu2: Intel(R) Atom(TM) CPU D525 @ 1.80GHz, 1800.00 MHz cpu2: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUS H,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,DTES64,MWAIT,DS-CPL,TM2,SSSE3, CX16,xTPR,PDCM,MOVBE,NXE,LONG,LAHF,PERF,SENSOR cpu2: 512KB 64b/line 8-way L2 cache cpu2: smt 1, core 0, package 0 cpu3 at mainbus0: apid 3 (application processor) cpu3: Intel(R) Atom(TM) CPU D525 @ 1.80GHz, 1800.00 MHz cpu3: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUS H,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,DTES64,MWAIT,DS-CPL,TM2,SSSE3, CX16,xTPR,PDCM,MOVBE,NXE,LONG,LAHF,PERF,SENSOR cpu3: 512KB 64b/line 8-way L2 cache cpu3: smt 1, core 1, package 0 ioapic0 at mainbus0: apid 4 pa 0xfec0, version 20, 24 pins ioapic0: misconfigured as apic 1, remapped to apid 4 acpimcfg0 at acpi0 addr 0xe000, bus 0-255 acpihpet0 at acpi0: 14318179 Hz acpiprt0 at acpi0: bus 0 (PCI0) acpiprt1 at acpi0: bus 4 (P0P1) acpiprt2 at acpi0: bus 1 (P0P4) acpiprt3 at acpi0: bus 2 (P0P8) acpiprt4 at acpi0: bus 3 (P0P9) acpicpu0 at acpi0: C1(@1 halt!) acpicpu1 at acpi0: C1(@1 halt!) acpicpu2 at acpi0: C1(@1 halt!) acpicpu3 at acpi0: C1(@1 halt!) acpibtn0 at acpi0: SLPB acpibtn1 at acpi0: PWRB pci0 at mainbus0 bus 0 pchb0 at pci0 dev 0 function 0 "Intel Pineview DMI" rev 0x02 uhci0 at pci0 dev 26 function 0 "Intel 82801I USB" rev 0x02: apic 4 int 16 uhci1 at pci0 dev 26 function 1 "Intel 82801I USB" rev 0x02: apic 4 int 21 uhci2 at pci0 dev 26 function 2 "Intel 82801I USB" rev 0x02: apic 4 int 19 ehci0 at pci0 dev 26 function 7 "Intel 82801I USB" rev 0x02: apic 4 int 18 usb0 at ehci0: USB revision 2.0
requesting help working around boot failures with supermicro atom board
hi all. i’m having difficulty with this board: Supermicro X7SPE-HD-D525 rev1 i have several similar systems, each running an older version of OpenBSD for a few years without incident. except this one … running OpenBSD 5.7 i386, from cold start it boots just fine and runs until rebooted. once rebooted, however, prior to anything being displayed (i assume this is early in the bios post phase) i get one very long beep. super micro tells me this indicates inability to correctly initialize the memory. okay, so i’ve changed memory for known working components and have the same issue. at this point, the only thing that gets me booting again is to remove power and then restore power. it then boots fine from cold start, and fails on the next reboot (as in, “reboot” from the command line). once in long-beep failure mode, neither the hardware reset button nor the power button can make the machine boot again. the only thing that works is removing power. every once in a while it will reboot successfully, only to fail in the same manner on the next attempt. super micro has had me flash bios, clear cmos, boot from different devices and with nothing connected, etc. the results are the same: when rebooting from openbsd, next boot fails until power is removed/restored. super micro blames openbsd. i installed linux (same hardware, overwrite openbsd 5.7) and scheduled a reboot every 5 minutes and left it overnight. i logged 554 successful reboots. i have since installed the latest available openbsd amd64 snapshot, and am seeing the same failures. i’m wondering if something could be disabled (boot -c ?) or if something else raises a red flag and might have a workaround. this has me stumped. i would very much appreciate a clue stick. dmesg follows: OpenBSD 5.8-current (GENERIC.MP) #1364: Wed Sep 9 17:32:01 MDT 2015 dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP real mem = 4277665792 (4079MB) avail mem = 4144070656 (3952MB) mpath0 at root scsibus0 at mpath0: 256 targets mainbus0 at root bios0 at mainbus0 acpi0 at bios0: rev 2 acpi0: sleep states S0 S1 S4 S5 acpi0: tables DSDT FACP APIC MCFG OEMB HPET EINJ BERT ERST HEST acpi0: wakeup devices P0P1(S4) PS2K(S4) PS2M(S4) USB0(S4) USB1(S4) USB2(S4) USB5(S4) EUSB(S4) USB3(S4) USB4(S4) USB6(S4) USBE(S4) P0P4(S4) P0P5(S4) P0P6(S4) P0P7(S4) [...] acpitimer0 at acpi0: 3579545 Hz, 24 bits acpimadt0 at acpi0 addr 0xfee0: PC-AT compat cpu0 at mainbus0: apid 0 (boot processor) cpu0: Intel(R) Atom(TM) CPU D525 @ 1.80GHz, 1800.23 MHz cpu0: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,DTES64,MWAIT,DS-CPL,TM2,SSSE3,CX16,xTPR,PDCM,MOVBE,NXE,LONG,LAHF,PERF,SENSOR cpu0: 512KB 64b/line 8-way L2 cache cpu0: smt 0, core 0, package 0 mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges cpu0: apic clock running at 199MHz cpu0: mwait min=64, max=64, C-substates=0.1, IBE cpu1 at mainbus0: apid 2 (application processor) cpu1: Intel(R) Atom(TM) CPU D525 @ 1.80GHz, 1800.00 MHz cpu1: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,DTES64,MWAIT,DS-CPL,TM2,SSSE3,CX16,xTPR,PDCM,MOVBE,NXE,LONG,LAHF,PERF,SENSOR cpu1: 512KB 64b/line 8-way L2 cache cpu1: smt 0, core 1, package 0 cpu2 at mainbus0: apid 1 (application processor) cpu2: Intel(R) Atom(TM) CPU D525 @ 1.80GHz, 1800.00 MHz cpu2: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,DTES64,MWAIT,DS-CPL,TM2,SSSE3,CX16,xTPR,PDCM,MOVBE,NXE,LONG,LAHF,PERF,SENSOR cpu2: 512KB 64b/line 8-way L2 cache cpu2: smt 1, core 0, package 0 cpu3 at mainbus0: apid 3 (application processor) cpu3: Intel(R) Atom(TM) CPU D525 @ 1.80GHz, 1800.00 MHz cpu3: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,DTES64,MWAIT,DS-CPL,TM2,SSSE3,CX16,xTPR,PDCM,MOVBE,NXE,LONG,LAHF,PERF,SENSOR cpu3: 512KB 64b/line 8-way L2 cache cpu3: smt 1, core 1, package 0 ioapic0 at mainbus0: apid 4 pa 0xfec0, version 20, 24 pins ioapic0: misconfigured as apic 1, remapped to apid 4 acpimcfg0 at acpi0 addr 0xe000, bus 0-255 acpihpet0 at acpi0: 14318179 Hz acpiprt0 at acpi0: bus 0 (PCI0) acpiprt1 at acpi0: bus 4 (P0P1) acpiprt2 at acpi0: bus 1 (P0P4) acpiprt3 at acpi0: bus 2 (P0P8) acpiprt4 at acpi0: bus 3 (P0P9) acpicpu0 at acpi0: C1(@1 halt!) acpicpu1 at acpi0: C1(@1 halt!) acpicpu2 at acpi0: C1(@1 halt!) acpicpu3 at acpi0: C1(@1 halt!) acpibtn0 at acpi0: SLPB acpibtn1 at acpi0: PWRB pci0 at mainbus0 bus 0 pchb0 at pci0 dev 0 function 0 "Intel Pineview DMI" rev 0x02 uhci0 at pci0 dev 26 function 0 "Intel 82801I USB" rev 0x02: apic 4 int 16 uhci1 at pci0 dev 26 function 1 "Intel 82801I USB" rev 0x02: apic 4 int 21 uhci2 at pci0 dev 26 function 2 "Intel 82801I USB" rev 0x02: apic 4 int 19 ehci0 at pci0 dev 26 function 7 "Intel 82801I