Re: requesting help working around boot failures with supermicro atom board

2015-10-21 Thread lists
Synopsis: if sensors show missing data then reset the BMC unit before
rebooting the system to prevent unable to boot long beep issue.

I found a reliably reproducible workaround for this problem retaining
control continuity without the need to trip the mains breaker.  This
entirely prevents the long beep issue and allows the system to be used
in headless remote environments without ensuring remote mains power
cycle capability and/or remote hands intervention.

I have not had to disable the lm(4) sensor as advised previously for
the workaround and reached the conclusion this problem is not caused
by the driver itself in the first place, but by a buggy BMC firmware.

For this it is advisable to contact again the technical support at
Supermicro and ask them for a reliable BMC firmware update which does
not manifest the problem.

After running for a longer period (non specific or deterministic, above
30min), the sensors start to display wrong (missing) values and can not
provide data points to the BMC firmware.  This is seen both in IPMI
direct and networked access and in the web based management interface.
At this point, a reboot would get the system unable to boot manifesting
the dreaded long beep.  Only a power cycle of mains (power supply
breaker or power distribution unit) for a couple of seconds unblocks
the system and it is capable of successfully booting up again.  This
however totally undermines the remote control capabilities of the
system effectively turning it into a continuous source of remote
management manual reboot requests via intervention events for mains
power cycle (stop and start).

The workaround for this is to reset the BMC before attempting to reboot
the system, and it works over the network directly over IPMI and also
via the web based BMC interface likewise.  This only reboots the IPMI
controller (not the system) and its embedded firmware, then after a
couple of minutes the sensors poll actual correct data and display it
properly.  At this point a system reboot issued succeeds as expected and
everything the system boots up and works properly, until some non
specific longer time passes again (from 1h to days) and the BMC
controller gets stuck again (with a certainty it gets stuck) for which
the indication is missing sensors data and no reboot capability with
the long beep indication.

This is NOT OS specific unless the driver polling the sensors causes
the sensors sub-system in the embedded controller OS to crash, the only
factor affecting it so far is found to be the time running the system
without mains power cycle.  It is a flaw of the BMC firmware for which
the solution for sure is to demand an updated firmware from Supermicro
without this fault.  It would help if more people voice their concerns
over this so an updated BMC firmware is issued from Supermicro technical
support and published on their web site.

Here is how it looks when the BMC is stuck:

$ ipmi-sensor 
System Temp  | no reading| ns
CPU Temp | no reading| ns
CPU FAN  | no reading| ns
SYS FAN  | no reading| ns
CPU Vcore| no reading| ns
Vichcore | no reading| ns
+3.3VCC  | no reading| ns
VDIMM| no reading| ns
+5 V | no reading| ns
+12 V| no reading| ns
+3.3VSB  | no reading| ns
VBAT | no reading| ns
Chassis Intru| no reading| ns
PS Status| 0x00  | ok

$ ipmi-sensor-detail  
System Temp  | na || na| na| na| na 
   | na| na| na
CPU Temp | na || na| na| na| na 
   | na| na| na
CPU FAN  | na || na| na| na| na 
   | na| na| na
SYS FAN  | na || na| na| na| na 
   | na| na| na
CPU Vcore| na || na| na| na| na 
   | na| na| na
Vichcore | na || na| na| na| na 
   | na| na| na
+3.3VCC  | na || na| na| na| na 
   | na| na| na
VDIMM| na || na| na| na| na 
   | na| na| na
+5 V | na || na| na| na| na 
   | na| na| na
+12 V| na || na| na| na| na 
   | na| na| na
+3.3VSB  | na || na| na| na

Re: requesting help working around boot failures with supermicro atom board

2015-10-21 Thread Jack Peirce
I have a great relationship with some SuperMicro engineers, if others can
provide part #'s and firmare/bios revs, I can bring this up with them.

From: owner-m...@openbsd.org
<owner-m...@openbsd.org> on behalf of li...@wrant.com <li...@wrant.com>
Sent:
Wednesday, October 21, 2015 8:50 PM
To: misc@openbsd.org
Subject: Re:
requesting help working around boot failures with supermicro atom board
Synopsis: if sensors show missing data then reset the BMC unit before
rebooting the system to prevent unable to boot long beep issue.

I found a
reliably reproducible workaround for this problem retaining
control continuity
without the need to trip the mains breaker.  This
entirely prevents the long
beep issue and allows the system to be used
in headless remote environments
without ensuring remote mains power
cycle capability and/or remote hands
intervention.

I have not had to disable the lm(4) sensor as advised
previously for
the workaround and reached the conclusion this problem is not
caused
by the driver itself in the first place, but by a buggy BMC firmware.
For this it is advisable to contact again the technical support at
Supermicro
and ask them for a reliable BMC firmware update which does
not manifest the
problem.

After running for a longer period (non specific or deterministic,
above
30min), the sensors start to display wrong (missing) values and can not
provide data points to the BMC firmware.  This is seen both in IPMI
direct and
networked access and in the web based management interface.
At this point, a
reboot would get the system unable to boot manifesting
the dreaded long beep.
Only a power cycle of mains (power supply
breaker or power distribution unit)
for a couple of seconds unblocks
the system and it is capable of successfully
booting up again.  This
however totally undermines the remote control
capabilities of the
system effectively turning it into a continuous source of
remote
management manual reboot requests via intervention events for mains
power cycle (stop and start).

The workaround for this is to reset the BMC
before attempting to reboot
the system, and it works over the network directly
over IPMI and also
via the web based BMC interface likewise.  This only
reboots the IPMI
controller (not the system) and its embedded firmware, then
after a
couple of minutes the sensors poll actual correct data and display it
properly.  At this point a system reboot issued succeeds as expected and
everything the system boots up and works properly, until some non
specific
longer time passes again (from 1h to days) and the BMC
controller gets stuck
again (with a certainty it gets stuck) for which
the indication is missing
sensors data and no reboot capability with
the long beep indication.

This is
NOT OS specific unless the driver polling the sensors causes
the sensors
sub-system in the embedded controller OS to crash, the only
factor affecting
it so far is found to be the time running the system
without mains power
cycle.  It is a flaw of the BMC firmware for which
the solution for sure is to
demand an updated firmware from Supermicro
without this fault.  It would help
if more people voice their concerns
over this so an updated BMC firmware is
issued from Supermicro technical
support and published on their web site.
Here is how it looks when the BMC is stuck:

$ ipmi-sensor
System Temp  |
no reading| ns
CPU Temp | no reading| ns
CPU FAN
| no reading| ns
SYS FAN  | no reading| ns
CPU Vcore
| no reading| ns
Vichcore | no reading| ns
+3.3VCC
| no reading| ns
VDIMM| no reading| ns
+5 V
| no reading| ns
+12 V| no reading| ns
+3.3VSB
| no reading| ns
VBAT | no reading| ns
Chassis
Intru| no reading| ns
PS Status| 0x00  | ok

$
ipmi-sensor-detail
System Temp  | na || na| na
| na| na| na| na| na
CPU Temp | na
|| na| na| na| na| na| na
| na
CPU FAN  | na || na| na| na
| na| na| na| na
SYS FAN  | na |
| na| na| na| na| na| na| na
CPU
Vcore| na || na| na| na| na
| na| na| na
Vichcore | na || na
| na| na| na| na| na| na
+3.3VCC
| na || na| na| na| na| na
| na| na
VDIMM| na || na| na
| na| na| na| na| na
+5 V | na
|| na| na| na| na| na| na
| na
+12 V| na || na| na| na
| na| na

Re: requesting help working around boot failures with supermicro atom board

2015-10-08 Thread Mike Larkin
On Wed, Oct 07, 2015 at 11:17:25PM -0400, Dewey Hylton wrote:
> you missed my update which followed that post. it did not survive the night
> - even with lm disabled in the kernel, some number of reboots later i
> encountered the same failure. that update is on the list, but i'll include
> the copy/paste below.
> 
> meanwhile, is there still hope for answers relating to acpi?
> 

I doubt it. I took a look at your AML and it seemed reasonable.

-ml

> -- Forwarded message --
> From: Dewey Hylton <dewey.hyl...@gmail.com>
> To: misc@openbsd.org
> Cc:
> Date: Tue, 15 Sep 2015 19:19:10 + (UTC)
> Subject: Re: requesting help working around boot failures with supermicro
> atom board
> Dewey Hylton  gmail.com> writes:
> 
> >
> > Mark Kettenis  xs4all.nl> writes:
> 
> > > Oh that is interesting.  Can you try disabling the lm(4) driver in
> > > your kernel?  You can do:
> > >
> > > # config -ef /bsd
> > > ...
> > > ukc> disable lm
> > > 254 lm0 disabled
> > > 255 lm* disabled
> > > 256 lm* disabled
> > > ukc> quit
> > > Saving modified kernel.
> > > # reboot
> > >
> > > That reboot will probably still hang.  But it'd be interesting to see
> > > if any subsequent reboots work better.
> >
> 
> 
> sadly, the first thing i heard when entering the lab this morning was
> BEP!
> 
> so disabling the sensor drivers in the kernel did not do the trick. without
> other ideas, i'm down to providing acpidump output and hoping someone can
> tell me where to go next ...
> 
> 
> On Wed, Oct 7, 2015 at 12:41 AM, Mike Larkin <mlar...@azathoth.net> wrote:
> 
> > On Tue, Sep 15, 2015 at 02:45:02AM +, Dewey Hylton wrote:
> > > Mark Kettenis  xs4all.nl> writes:
> > >
> > > >
> > > > > # sysctl -a|grep 'sensors.*temp'
> > > > > hw.sensors.cpu0.temp0=30.00 degC
> > > > > hw.sensors.lm1.temp0=0.00 degC
> > > > > hw.sensors.lm1.temp1=14.00 degC
> > > > > hw.sensors.lm1.temp2=14.00 degC
> > > > > # reboot
> > > > >
> > > > > BEEEP!
> > > >
> > > > Oh that is interesting.  Can you try disabling the lm(4) driver in
> > > > your kernel?  You can do:
> > > >
> > > > # config -ef /bsd
> > > > ...
> > > > ukc> disable lm
> > > > 254 lm0 disabled
> > > > 255 lm* disabled
> > > > 256 lm* disabled
> > > > ukc> quit
> > > > Saving modified kernel.
> > > > # reboot
> > > >
> > > > That reboot will probably still hang.  But it'd be interesting to see
> > > > if any subsequent reboots work better.
> > >
> > > *this* interests me, and was basically what i was asking in the original
> > > post - except i had no idea what might need to be disabled. one step at a
> > > time, it's been interesting the things that have popped up.
> > >
> > > still no idea whether this has anything to do with the seemingly
> > > openbsd-only issue, but ...
> > >
> > > i made this change, booted the new kernel, ran 'cksum /dev/mem' a bunch
> > of
> > > times in hopes of raising the temperature somewhat (did get to 36C,
> > which is
> > > higher than in my previous tests). then i rebooted, and the box came
> > back up
> > > without incident.
> > >
> > > so i'm going to run through this several times with reboots in every 20
> > > minutes or so and see if it survives the night.
> > >
> >
> > Based on this and my previous email, my recommendation would be to disable
> > lm(4) on this particular machine.



Re: requesting help working around boot failures with supermicro atom board

2015-10-08 Thread Dewey Hylton
ah, well thanks for taking a look.

On Thu, Oct 8, 2015 at 3:09 PM, Mike Larkin <mlar...@azathoth.net> wrote:

> On Wed, Oct 07, 2015 at 11:17:25PM -0400, Dewey Hylton wrote:
> > you missed my update which followed that post. it did not survive the
> night
> > - even with lm disabled in the kernel, some number of reboots later i
> > encountered the same failure. that update is on the list, but i'll
> include
> > the copy/paste below.
> >
> > meanwhile, is there still hope for answers relating to acpi?
> >
>
> I doubt it. I took a look at your AML and it seemed reasonable.
>
> -ml
>
> > -- Forwarded message --
> > From: Dewey Hylton <dewey.hyl...@gmail.com>
> > To: misc@openbsd.org
> > Cc:
> > Date: Tue, 15 Sep 2015 19:19:10 + (UTC)
> > Subject: Re: requesting help working around boot failures with supermicro
> > atom board
> > Dewey Hylton  gmail.com> writes:
> >
> > >
> > > Mark Kettenis  xs4all.nl> writes:
> >
> > > > Oh that is interesting.  Can you try disabling the lm(4) driver in
> > > > your kernel?  You can do:
> > > >
> > > > # config -ef /bsd
> > > > ...
> > > > ukc> disable lm
> > > > 254 lm0 disabled
> > > > 255 lm* disabled
> > > > 256 lm* disabled
> > > > ukc> quit
> > > > Saving modified kernel.
> > > > # reboot
> > > >
> > > > That reboot will probably still hang.  But it'd be interesting to see
> > > > if any subsequent reboots work better.
> > >
> >
> >
> > sadly, the first thing i heard when entering the lab this morning was
> > BEP!
> >
> > so disabling the sensor drivers in the kernel did not do the trick.
> without
> > other ideas, i'm down to providing acpidump output and hoping someone can
> > tell me where to go next ...
> >
> >
> > On Wed, Oct 7, 2015 at 12:41 AM, Mike Larkin <mlar...@azathoth.net>
> wrote:
> >
> > > On Tue, Sep 15, 2015 at 02:45:02AM +, Dewey Hylton wrote:
> > > > Mark Kettenis  xs4all.nl> writes:
> > > >
> > > > >
> > > > > > # sysctl -a|grep 'sensors.*temp'
> > > > > > hw.sensors.cpu0.temp0=30.00 degC
> > > > > > hw.sensors.lm1.temp0=0.00 degC
> > > > > > hw.sensors.lm1.temp1=14.00 degC
> > > > > > hw.sensors.lm1.temp2=14.00 degC
> > > > > > # reboot
> > > > > >
> > > > > > BEEEP!
> > > > >
> > > > > Oh that is interesting.  Can you try disabling the lm(4) driver in
> > > > > your kernel?  You can do:
> > > > >
> > > > > # config -ef /bsd
> > > > > ...
> > > > > ukc> disable lm
> > > > > 254 lm0 disabled
> > > > > 255 lm* disabled
> > > > > 256 lm* disabled
> > > > > ukc> quit
> > > > > Saving modified kernel.
> > > > > # reboot
> > > > >
> > > > > That reboot will probably still hang.  But it'd be interesting to
> see
> > > > > if any subsequent reboots work better.
> > > >
> > > > *this* interests me, and was basically what i was asking in the
> original
> > > > post - except i had no idea what might need to be disabled. one step
> at a
> > > > time, it's been interesting the things that have popped up.
> > > >
> > > > still no idea whether this has anything to do with the seemingly
> > > > openbsd-only issue, but ...
> > > >
> > > > i made this change, booted the new kernel, ran 'cksum /dev/mem' a
> bunch
> > > of
> > > > times in hopes of raising the temperature somewhat (did get to 36C,
> > > which is
> > > > higher than in my previous tests). then i rebooted, and the box came
> > > back up
> > > > without incident.
> > > >
> > > > so i'm going to run through this several times with reboots in every
> 20
> > > > minutes or so and see if it survives the night.
> > > >
> > >
> > > Based on this and my previous email, my recommendation would be to
> disable
> > > lm(4) on this particular machine.



Re: requesting help working around boot failures with supermicro atom board

2015-10-07 Thread Dewey Hylton
you missed my update which followed that post. it did not survive the night
- even with lm disabled in the kernel, some number of reboots later i
encountered the same failure. that update is on the list, but i'll include
the copy/paste below.

meanwhile, is there still hope for answers relating to acpi?

-- Forwarded message --
From: Dewey Hylton <dewey.hyl...@gmail.com>
To: misc@openbsd.org
Cc:
Date: Tue, 15 Sep 2015 19:19:10 + (UTC)
Subject: Re: requesting help working around boot failures with supermicro
atom board
Dewey Hylton  gmail.com> writes:

>
> Mark Kettenis  xs4all.nl> writes:

> > Oh that is interesting.  Can you try disabling the lm(4) driver in
> > your kernel?  You can do:
> >
> > # config -ef /bsd
> > ...
> > ukc> disable lm
> > 254 lm0 disabled
> > 255 lm* disabled
> > 256 lm* disabled
> > ukc> quit
> > Saving modified kernel.
> > # reboot
> >
> > That reboot will probably still hang.  But it'd be interesting to see
> > if any subsequent reboots work better.
>


sadly, the first thing i heard when entering the lab this morning was
BEP!

so disabling the sensor drivers in the kernel did not do the trick. without
other ideas, i'm down to providing acpidump output and hoping someone can
tell me where to go next ...


On Wed, Oct 7, 2015 at 12:41 AM, Mike Larkin <mlar...@azathoth.net> wrote:

> On Tue, Sep 15, 2015 at 02:45:02AM +, Dewey Hylton wrote:
> > Mark Kettenis  xs4all.nl> writes:
> >
> > >
> > > > # sysctl -a|grep 'sensors.*temp'
> > > > hw.sensors.cpu0.temp0=30.00 degC
> > > > hw.sensors.lm1.temp0=0.00 degC
> > > > hw.sensors.lm1.temp1=14.00 degC
> > > > hw.sensors.lm1.temp2=14.00 degC
> > > > # reboot
> > > >
> > > > BEEEP!
> > >
> > > Oh that is interesting.  Can you try disabling the lm(4) driver in
> > > your kernel?  You can do:
> > >
> > > # config -ef /bsd
> > > ...
> > > ukc> disable lm
> > > 254 lm0 disabled
> > > 255 lm* disabled
> > > 256 lm* disabled
> > > ukc> quit
> > > Saving modified kernel.
> > > # reboot
> > >
> > > That reboot will probably still hang.  But it'd be interesting to see
> > > if any subsequent reboots work better.
> >
> > *this* interests me, and was basically what i was asking in the original
> > post - except i had no idea what might need to be disabled. one step at a
> > time, it's been interesting the things that have popped up.
> >
> > still no idea whether this has anything to do with the seemingly
> > openbsd-only issue, but ...
> >
> > i made this change, booted the new kernel, ran 'cksum /dev/mem' a bunch
> of
> > times in hopes of raising the temperature somewhat (did get to 36C,
> which is
> > higher than in my previous tests). then i rebooted, and the box came
> back up
> > without incident.
> >
> > so i'm going to run through this several times with reboots in every 20
> > minutes or so and see if it survives the night.
> >
>
> Based on this and my previous email, my recommendation would be to disable
> lm(4) on this particular machine.



Re: requesting help working around boot failures with supermicro atom board

2015-10-07 Thread lists
Tue, 6 Oct 2015 21:41:15 -0700 Mike Larkin 
> I had thought this was acpi related earlier (before we realized that disabling
> lm* fixes it). So I have no news here, as I don't think the solution is going
> to be found in the AML.

Thanks for the update and pointer in the right direction (regarding
disabling lm(4) sensor). Indeed this does not happen with bsd.rd during
upgrades, and I recall back in the day this issue may not have been
originally present back in 2011.

> The lm(4) sensor is probably getting wedged somehow, which is causing the bios
> to think the machine is too hot on reboot. Even though it's not.

Makes sense as the readings are improbable after running for a while
and things looks stuck somehow, including in the BMC web interface.

Side note: I know, just don't sway to the insane design flaws regarding
security and interfaces, there are popcorn scary topics on the list,
now stay on topic pls.

Here is the reading from a long run where the sensors appear stuck both
on the shell and in the BMC:

$ sysctl hw.sensors
hw.sensors.cpu0.temp0=33.00 degC
hw.sensors.lm1.temp0=-1.00 degC
hw.sensors.lm1.temp1=-0.50 degC
hw.sensors.lm1.temp2=-0.50 degC
hw.sensors.lm1.volt0=2.04 VDC (VCore)
hw.sensors.lm1.volt1=13.46 VDC (+12V)
hw.sensors.lm1.volt2=4.08 VDC (+3.3V)
hw.sensors.lm1.volt3=4.08 VDC (+3.3V)
hw.sensors.lm1.volt4=1.85 VDC (-12V)
hw.sensors.lm1.volt5=0.00 VDC
hw.sensors.lm1.volt6=0.00 VDC
hw.sensors.lm1.volt7=4.08 VDC (3.3VSB)
hw.sensors.lm1.volt8=2.04 VDC (VBAT)
$

And again after resetting the IPMI device these look only incorrect at
some readings, but not as stuck as above:

$ sysctl hw.sensors 
hw.sensors.cpu0.temp0=33.00 degC
hw.sensors.lm1.temp0=41.00 degC
hw.sensors.lm1.temp1=42.00 degC
hw.sensors.lm1.temp2=26.00 degC
hw.sensors.lm1.volt0=1.10 VDC (VCore)
hw.sensors.lm1.volt1=6.86 VDC (+12V)
hw.sensors.lm1.volt2=3.33 VDC (+3.3V)
hw.sensors.lm1.volt3=3.33 VDC (+3.3V)
hw.sensors.lm1.volt4=-10.34 VDC (-12V)
hw.sensors.lm1.volt5=1.28 VDC
hw.sensors.lm1.volt6=1.82 VDC
hw.sensors.lm1.volt7=3.28 VDC (3.3VSB)
hw.sensors.lm1.volt8=1.57 VDC (VBAT)


> I don't know a lot about the lm(4) driver so I don't think I'll be able to
> help much here. One of the things I do know about it is that sometimes you
> don't actually even have a real lm(4), and that it's simulated by some other
> component or even SMM. Maybe the manufacturer did a poor job. Shrug.

Please compare the above with the values presented in the BMC web
interface:

NameStatus  Reading
System Temp Normal  41 degrees C
CPU TempNormal  42 degrees C
CPU FAN N/A Not Present!
SYS FAN N/A Not Present!
CPU Vcore   Normal  1.096 Volts
VichcoreNormal  1.04 Volts
+3.3VCC Normal  3.328 Volts
VDIMM   Normal  1.528 Volts
+5 VNormal  5.12 Volts
+12 V   Normal  12.084 Volts
+3.3VSB Normal  3.28 Volts
VBATNormal  3.136 Volts
Chassis Intru   OK
PS Status   Presence detected.

Here is from the ipmitool over the network:

$ ipmitool -I lanplus -U musr1 -f .ipmipass -H 10.10.10.10 sdr 
System Temp  | 41 degrees C  | ok
CPU Temp | 42 degrees C  | ok
CPU FAN  | no reading| ns
SYS FAN  | no reading| ns
CPU Vcore| 1.10 Volts| ok
Vichcore | 1.04 Volts| ok
+3.3VCC  | 3.33 Volts| ok
VDIMM| 1.54 Volts| ok
+5 V | 5.12 Volts| ok
+12 V| 12.08 Volts   | ok
+3.3VSB  | 3.28 Volts| ok
VBAT | 3.14 Volts| ok
Chassis Intru| 0x00  | ok
PS Status| 0x00  | ok
$

Same thing with mode details and thresholds (untouched from defaults,
for reference only where the lm(4) sensor may be getting some of the
funny values):

$ ipmitool -I lanplus -U musr1 -f .ipmipass -H 10.10.10.10 sensor 
System Temp  | 42.000 | degrees C  | ok| -9.000| -7.000| 
-5.000| 75.000| 77.000| 79.000
CPU Temp | 42.000 | degrees C  | ok| -11.000   | -8.000| 
-5.000| 85.000| 90.000| 95.000
CPU FAN  | na || na| na| na| na 
   | na| na| na
SYS FAN  | na || na| na| na| na 
   | na| na| na
CPU Vcore| 1.096  | Volts  | ok| 0.640 | 0.664 | 
0.688 | 1.344 | 1.408 | 1.472 
Vichcore | 1.040  | Volts  | ok| 0.808 | 0.824 | 
0.840 | 1.160 | 1.176 | 1.192 
+3.3VCC  | 3.328  | Volts  | ok| 2.816 | 2.880 | 
2.944 | 3.584 | 3.648 | 3.712 
VDIMM| 1.528  | Volts  

Re: requesting help working around boot failures with supermicro atom board

2015-10-06 Thread Mike Larkin
On Mon, Oct 05, 2015 at 01:18:53PM -0400, dewey.hyl...@gmail.com wrote:
> unfortunately, not on my end. i have hopes that mike larkin may find something
> when he gets a chance to look, but i am past the limit of my capabilities and
> supermicro support has discontinued responding to me. their last suggestion 
> was
> to switch to linux or windows, and their last message was of the "we'll get
> back to you" variety. 
> 

I had thought this was acpi related earlier (before we realized that disabling
lm* fixes it). So I have no news here, as I don't think the solution is going
to be found in the AML.

The lm(4) sensor is probably getting wedged somehow, which is causing the bios
to think the machine is too hot on reboot. Even though it's not.

I don't know a lot about the lm(4) driver so I don't think I'll be able to
help much here. One of the things I do know about it is that sometimes you
don't actually even have a real lm(4), and that it's simulated by some other
component or even SMM. Maybe the manufacturer did a poor job. Shrug.

Sorry, I'm out of ideas. Maybe someone else can debug it for you.

-ml

> so on a related note, i'm on the hunt for something which can replace this
> board's functionality without breaking the bank. something not supported by
> supermicro, as this is a brand new board and they seem to be unwilling to 
> provide support anyway. remote kvm/power is the sole purpose for choosing this
> supermicro device in the first place. i have plenty much more expensive and
> more powerful supermicro devices at customer sites which do not show this
> issue - but their non-support of this brand-new motherboard shows me that they
> are not who i want to be relying on.
> 
> - On Oct 5, 2015, at 12:08 PM, Sonic sonicsm...@gmail.com wrote:
> 
> Any progress on this issue?
> 
> On Thu, Sep 17, 2015 at 1:26 PM, Mike Larkin  wrote:
> > On Thu, Sep 17, 2015 at 12:40:12PM +, Dewey Hylton wrote:
> >> Dewey Hylton  gmail.com> writes:
> >>
> >> >
> >> > Mike Larkin  azathoth.net> writes:
> >> >
> >> > >
> >> > > On Tue, Sep 15, 2015 at 07:16:40PM +, Dewey Hylton wrote:
> >> > > > Dewey Hylton  gmail.com> writes:
> >> > > >
> >> > > > >
> >> > > > > Mike Larkin  azathoth.net> writes:
> >> > > >
> >> > > > > > acpidump please.
> >>
> >> > motherboard: supermicro x7spe-hf-d525 rev 1.0
> >> > bios: 1.2b
> >> >
> >> > at the end of this link is an archive containing acpidump output for all
> >> > three acpi settings in the bios (1.0, 2.0, 3.0).
> >> >
> >> > https://goo.gl/tWGL6C
> >> >
> >> > i apologize for the somewhat hidden link; gmane wouldn't allow me to post
> >> > the full link because it's greater than 80 characters.
> >> >
> >> > please let me know if i can help in any way; i honestly know nothing 
> >> > about
> >> > acpi but am willing to learn or assist otherwise if it means 
> >> > understanding
> >> > and potentially fixing this issue.
> >>
> >> i was able to export the DSDT files into something human-readable. while i
> >> don't really understand much of what i'm seeing in the resulting text 
> >> files,
> >> diff shows that the differences between the three acpi versions are
> >> nonexistent. i have no idea about the other files, of which there are 
> >> several.
> >>
> >> Mike, does the acpidump output help at all? if not, am i simply at the 
> >> point
> >> where this hardware is not compatible with OpenBSD?
> >>
> >
> > Haven't had a chance to look at it yet.
> >
> > -ml



Re: requesting help working around boot failures with supermicro atom board

2015-10-06 Thread Mike Larkin
On Tue, Sep 15, 2015 at 02:45:02AM +, Dewey Hylton wrote:
> Mark Kettenis  xs4all.nl> writes:
> 
> > 
> > > # sysctl -a|grep 'sensors.*temp'
> > > hw.sensors.cpu0.temp0=30.00 degC
> > > hw.sensors.lm1.temp0=0.00 degC
> > > hw.sensors.lm1.temp1=14.00 degC
> > > hw.sensors.lm1.temp2=14.00 degC
> > > # reboot
> > > 
> > > BEEEP!
> > 
> > Oh that is interesting.  Can you try disabling the lm(4) driver in
> > your kernel?  You can do:
> > 
> > # config -ef /bsd
> > ...
> > ukc> disable lm
> > 254 lm0 disabled
> > 255 lm* disabled
> > 256 lm* disabled
> > ukc> quit
> > Saving modified kernel.
> > # reboot
> > 
> > That reboot will probably still hang.  But it'd be interesting to see
> > if any subsequent reboots work better.
> 
> *this* interests me, and was basically what i was asking in the original
> post - except i had no idea what might need to be disabled. one step at a
> time, it's been interesting the things that have popped up.
> 
> still no idea whether this has anything to do with the seemingly
> openbsd-only issue, but ...
> 
> i made this change, booted the new kernel, ran 'cksum /dev/mem' a bunch of
> times in hopes of raising the temperature somewhat (did get to 36C, which is
> higher than in my previous tests). then i rebooted, and the box came back up
> without incident.
> 
> so i'm going to run through this several times with reboots in every 20
> minutes or so and see if it survives the night.
> 

Based on this and my previous email, my recommendation would be to disable
lm(4) on this particular machine.



Re: requesting help working around boot failures with supermicro atom board

2015-10-05 Thread Sonic
On Mon, Oct 5, 2015 at 1:18 PM, dewey.hyl...@gmail.com
 wrote:
> but their non-support of this brand-new motherboard

When the other OS's work fine is does seem to point to an OpenBSD
issue, but that's not always a reliable conclusion to arrive at.
Either way it would be nice to see it resolved.

There was a time when the problem did not exist, but it's been so long
that I have no clue any longer when the change occurred that triggered
the issue.

Chris



Re: requesting help working around boot failures with supermicro atom board

2015-10-05 Thread dewey.hyl...@gmail.com
unfortunately, not on my end. i have hopes that mike larkin may find something
when he gets a chance to look, but i am past the limit of my capabilities and
supermicro support has discontinued responding to me. their last suggestion was
to switch to linux or windows, and their last message was of the "we'll get
back to you" variety. 

so on a related note, i'm on the hunt for something which can replace this
board's functionality without breaking the bank. something not supported by
supermicro, as this is a brand new board and they seem to be unwilling to 
provide support anyway. remote kvm/power is the sole purpose for choosing this
supermicro device in the first place. i have plenty much more expensive and
more powerful supermicro devices at customer sites which do not show this
issue - but their non-support of this brand-new motherboard shows me that they
are not who i want to be relying on.

- On Oct 5, 2015, at 12:08 PM, Sonic sonicsm...@gmail.com wrote:

Any progress on this issue?

On Thu, Sep 17, 2015 at 1:26 PM, Mike Larkin  wrote:
> On Thu, Sep 17, 2015 at 12:40:12PM +, Dewey Hylton wrote:
>> Dewey Hylton  gmail.com> writes:
>>
>> >
>> > Mike Larkin  azathoth.net> writes:
>> >
>> > >
>> > > On Tue, Sep 15, 2015 at 07:16:40PM +, Dewey Hylton wrote:
>> > > > Dewey Hylton  gmail.com> writes:
>> > > >
>> > > > >
>> > > > > Mike Larkin  azathoth.net> writes:
>> > > >
>> > > > > > acpidump please.
>>
>> > motherboard: supermicro x7spe-hf-d525 rev 1.0
>> > bios: 1.2b
>> >
>> > at the end of this link is an archive containing acpidump output for all
>> > three acpi settings in the bios (1.0, 2.0, 3.0).
>> >
>> > https://goo.gl/tWGL6C
>> >
>> > i apologize for the somewhat hidden link; gmane wouldn't allow me to post
>> > the full link because it's greater than 80 characters.
>> >
>> > please let me know if i can help in any way; i honestly know nothing about
>> > acpi but am willing to learn or assist otherwise if it means understanding
>> > and potentially fixing this issue.
>>
>> i was able to export the DSDT files into something human-readable. while i
>> don't really understand much of what i'm seeing in the resulting text files,
>> diff shows that the differences between the three acpi versions are
>> nonexistent. i have no idea about the other files, of which there are 
>> several.
>>
>> Mike, does the acpidump output help at all? if not, am i simply at the point
>> where this hardware is not compatible with OpenBSD?
>>
>
> Haven't had a chance to look at it yet.
>
> -ml



Re: requesting help working around boot failures with supermicro atom board

2015-10-05 Thread Sonic
Any progress on this issue?

On Thu, Sep 17, 2015 at 1:26 PM, Mike Larkin  wrote:
> On Thu, Sep 17, 2015 at 12:40:12PM +, Dewey Hylton wrote:
>> Dewey Hylton  gmail.com> writes:
>>
>> >
>> > Mike Larkin  azathoth.net> writes:
>> >
>> > >
>> > > On Tue, Sep 15, 2015 at 07:16:40PM +, Dewey Hylton wrote:
>> > > > Dewey Hylton  gmail.com> writes:
>> > > >
>> > > > >
>> > > > > Mike Larkin  azathoth.net> writes:
>> > > >
>> > > > > > acpidump please.
>>
>> > motherboard: supermicro x7spe-hf-d525 rev 1.0
>> > bios: 1.2b
>> >
>> > at the end of this link is an archive containing acpidump output for all
>> > three acpi settings in the bios (1.0, 2.0, 3.0).
>> >
>> > https://goo.gl/tWGL6C
>> >
>> > i apologize for the somewhat hidden link; gmane wouldn't allow me to post
>> > the full link because it's greater than 80 characters.
>> >
>> > please let me know if i can help in any way; i honestly know nothing about
>> > acpi but am willing to learn or assist otherwise if it means understanding
>> > and potentially fixing this issue.
>>
>> i was able to export the DSDT files into something human-readable. while i
>> don't really understand much of what i'm seeing in the resulting text files,
>> diff shows that the differences between the three acpi versions are
>> nonexistent. i have no idea about the other files, of which there are 
>> several.
>>
>> Mike, does the acpidump output help at all? if not, am i simply at the point
>> where this hardware is not compatible with OpenBSD?
>>
>
> Haven't had a chance to look at it yet.
>
> -ml



Re: requesting help working around boot failures with supermicro atom board

2015-09-17 Thread Dewey Hylton
Dewey Hylton  gmail.com> writes:

> 
> Mike Larkin  azathoth.net> writes:
> 
> > 
> > On Tue, Sep 15, 2015 at 07:16:40PM +, Dewey Hylton wrote:
> > > Dewey Hylton  gmail.com> writes:
> > > 
> > > > 
> > > > Mike Larkin  azathoth.net> writes:
> > > 
> > > > > acpidump please.

> motherboard: supermicro x7spe-hf-d525 rev 1.0
> bios: 1.2b
> 
> at the end of this link is an archive containing acpidump output for all
> three acpi settings in the bios (1.0, 2.0, 3.0). 
> 
> https://goo.gl/tWGL6C
> 
> i apologize for the somewhat hidden link; gmane wouldn't allow me to post
> the full link because it's greater than 80 characters.
> 
> please let me know if i can help in any way; i honestly know nothing about
> acpi but am willing to learn or assist otherwise if it means understanding
> and potentially fixing this issue.

i was able to export the DSDT files into something human-readable. while i
don't really understand much of what i'm seeing in the resulting text files,
diff shows that the differences between the three acpi versions are
nonexistent. i have no idea about the other files, of which there are several.

Mike, does the acpidump output help at all? if not, am i simply at the point
where this hardware is not compatible with OpenBSD?



Re: requesting help working around boot failures with supermicro atom board

2015-09-17 Thread Mike Larkin
On Thu, Sep 17, 2015 at 12:40:12PM +, Dewey Hylton wrote:
> Dewey Hylton  gmail.com> writes:
> 
> > 
> > Mike Larkin  azathoth.net> writes:
> > 
> > > 
> > > On Tue, Sep 15, 2015 at 07:16:40PM +, Dewey Hylton wrote:
> > > > Dewey Hylton  gmail.com> writes:
> > > > 
> > > > > 
> > > > > Mike Larkin  azathoth.net> writes:
> > > > 
> > > > > > acpidump please.
> 
> > motherboard: supermicro x7spe-hf-d525 rev 1.0
> > bios: 1.2b
> > 
> > at the end of this link is an archive containing acpidump output for all
> > three acpi settings in the bios (1.0, 2.0, 3.0). 
> > 
> > https://goo.gl/tWGL6C
> > 
> > i apologize for the somewhat hidden link; gmane wouldn't allow me to post
> > the full link because it's greater than 80 characters.
> > 
> > please let me know if i can help in any way; i honestly know nothing about
> > acpi but am willing to learn or assist otherwise if it means understanding
> > and potentially fixing this issue.
> 
> i was able to export the DSDT files into something human-readable. while i
> don't really understand much of what i'm seeing in the resulting text files,
> diff shows that the differences between the three acpi versions are
> nonexistent. i have no idea about the other files, of which there are several.
> 
> Mike, does the acpidump output help at all? if not, am i simply at the point
> where this hardware is not compatible with OpenBSD?
> 

Haven't had a chance to look at it yet.

-ml



Re: requesting help working around boot failures with supermicro atom board

2015-09-15 Thread Dewey Hylton
Mike Larkin  azathoth.net> writes:

> 
> On Tue, Sep 15, 2015 at 07:16:40PM +, Dewey Hylton wrote:
> > Dewey Hylton  gmail.com> writes:
> > 
> > > 
> > > Mike Larkin  azathoth.net> writes:
> > 
> > > > acpidump please.
> > > 
> > > my pleasure:
> > > 
> > > [demime removed a uuencoded section named
> > supermicro-X7SPE-HF-D525-acpidump.tgz which was 276 lines]
> > > 
> > > 
> > 
> > alright ... so this didn't work. i'll try to make the acpidump available via
> > another site somewhere. on that note, the bios allows selection between acpi
> > 1/2/3 - would it help at all to have acpidump for each of those three
settings?
> > 
> 
> Sure.
> 
> 

motherboard: supermicro x7spe-hf-d525 rev 1.0
bios: 1.2b

at the end of this link is an archive containing acpidump output for all
three acpi settings in the bios (1.0, 2.0, 3.0). 

https://goo.gl/tWGL6C

i apologize for the somewhat hidden link; gmane wouldn't allow me to post
the full link because it's greater than 80 characters.

please let me know if i can help in any way; i honestly know nothing about
acpi but am willing to learn or assist otherwise if it means understanding
and potentially fixing this issue.



Re: requesting help working around boot failures with supermicro atom board

2015-09-15 Thread Mike Larkin
On Tue, Sep 15, 2015 at 07:16:40PM +, Dewey Hylton wrote:
> Dewey Hylton  gmail.com> writes:
> 
> > 
> > Mike Larkin  azathoth.net> writes:
> 
> > > acpidump please.
> > 
> > my pleasure:
> > 
> > [demime removed a uuencoded section named
> supermicro-X7SPE-HF-D525-acpidump.tgz which was 276 lines]
> > 
> > 
> 
> alright ... so this didn't work. i'll try to make the acpidump available via
> another site somewhere. on that note, the bios allows selection between acpi
> 1/2/3 - would it help at all to have acpidump for each of those three 
> settings?
> 

Sure.



Re: requesting help working around boot failures with supermicro atom board

2015-09-15 Thread Dewey Hylton
Dewey Hylton  gmail.com> writes:

> 
> Mark Kettenis  xs4all.nl> writes:

> > Oh that is interesting.  Can you try disabling the lm(4) driver in
> > your kernel?  You can do:
> > 
> > # config -ef /bsd
> > ...
> > ukc> disable lm
> > 254 lm0 disabled
> > 255 lm* disabled
> > 256 lm* disabled
> > ukc> quit
> > Saving modified kernel.
> > # reboot
> > 
> > That reboot will probably still hang.  But it'd be interesting to see
> > if any subsequent reboots work better.
> 


sadly, the first thing i heard when entering the lab this morning was BEP!

so disabling the sensor drivers in the kernel did not do the trick. without
other ideas, i'm down to providing acpidump output and hoping someone can
tell me where to go next ...



Re: requesting help working around boot failures with supermicro atom board

2015-09-15 Thread Dewey Hylton
Dewey Hylton  gmail.com> writes:

> 
> Mike Larkin  azathoth.net> writes:

> > acpidump please.
> 
> my pleasure:
> 
> [demime removed a uuencoded section named
supermicro-X7SPE-HF-D525-acpidump.tgz which was 276 lines]
> 
> 

alright ... so this didn't work. i'll try to make the acpidump available via
another site somewhere. on that note, the bios allows selection between acpi
1/2/3 - would it help at all to have acpidump for each of those three settings?



Re: requesting help working around boot failures with supermicro atom board

2015-09-14 Thread Kurt Mosiejczuk
On Sat, Sep 12, 2015 at 03:51:36PM +, Dewey Hylton wrote:

> the only real differences i see are:
> 1) bios revision
> 2) secondary disk attached to different sata port
> 3) sensors only present on working machine

I've had this issue with the same systems.  Never guessed it would be OpenBSD
specific.  What I've found to make it stop happening is pulling the board
out and redoing the thermal paste for the CPU heatsink.  I had found 
some reference indicating that the alarm I got might be because of overheating.

The difference between the boxes may be the attention to detail the factory
worker who put it together had that day.

Hearing that Linux doesn't trip it, I'm wondering if it's an ACPI difference
between OpenBSD and Linux.  Perhaps OpenBSD runs the CPU hotter before
turning it back over to the BIOS on reboot?

--Kurt



Re: requesting help working around boot failures with supermicro atom board

2015-09-14 Thread Dewey Hylton
Patrick Dohman  comcast.net> writes:

> 
> Any thermal settings in the bios? CPU performance, Fan Speed etc..
> 
> Does the fan idle correctly? Often intel chipsets will throttle the fan
during a bios test.
> 
> Perhaps ACPI is not routing an interrupt??

Not much is available to be tweaked in this particular setup, though i do
have the options of acpi 1, 2, 3. changing those doesn't appear to result in
any difference.

regarding ACPI not routing an interrupt ... can you be more specific? is
there some way i could test this?



Re: requesting help working around boot failures with supermicro atom board

2015-09-14 Thread Dewey Hylton
Kurt Mosiejczuk  se.rit.edu> writes:

> 
> On Sat, Sep 12, 2015 at 03:51:36PM +, Dewey Hylton wrote:
> 
> > the only real differences i see are:
> > 1) bios revision
> > 2) secondary disk attached to different sata port
> > 3) sensors only present on working machine
> 
> I've had this issue with the same systems.  Never guessed it would be OpenBSD
> specific.  What I've found to make it stop happening is pulling the board
> out and redoing the thermal paste for the CPU heatsink.  I had found 
> some reference indicating that the alarm I got might be because of
overheating.
> 
> The difference between the boxes may be the attention to detail the factory
> worker who put it together had that day.
> 
> Hearing that Linux doesn't trip it, I'm wondering if it's an ACPI difference
> between OpenBSD and Linux.  Perhaps OpenBSD runs the CPU hotter before
> turning it back over to the BIOS on reboot?
> 
> --Kurt

this is great information; thanks. any idea where the temperature reference
can be found?

i may be able to log the cpu temperature in both operating systems in order
to compare ...



Re: requesting help working around boot failures with supermicro atom board

2015-09-14 Thread Dewey Hylton
Kurt Mosiejczuk  se.rit.edu> writes:

> 
> On Mon, Sep 14, 2015 at 05:15:01PM +, Dewey Hylton wrote:
> 
> > > I've had this issue with the same systems.  Never guessed it would
> > > be OpenBSD specific.  What I've found to make it stop happening is
> > > pulling the board out and redoing the thermal paste for the CPU
> > > heatsink.  I had found some reference indicating that the alarm I
> > > got might be because of overheating.
> 
> > > The difference between the boxes may be the attention to detail the
> > > factory worker who put it together had that day.
> 
> > > Hearing that Linux doesn't trip it, I'm wondering if it's an ACPI
> > > difference between OpenBSD and Linux.  Perhaps OpenBSD runs the CPU
> > > hotter before turning it back over to the BIOS on reboot?
> 
> > this is great information; thanks. any idea where the temperature
> > reference can be found?
> 
> I don't remember, it was at least a couple years ago.  It was only one
> reference too.  Most talked about listening to beep codes, which this
> wasn't really beep codes...
> 
> > i may be able to log the cpu temperature in both operating systems in
> > order to compare ...
> 
> Possibly, but noticed I said "before turning it back over to the BIOS".
> If it's a difference in OS shutdown, it will be difficult to log the
> temperature.
> 
> --Kurt

understood, but i did uncover something that might provide a hint ... i
haven't duplicated the results more than half a dozen times, but so far it's
been consistent:

after first booting openbsd, i see the following output:

# sysctl -a|grep 'sensors.*temp'
hw.sensors.cpu0.temp0=31.00 degC
hw.sensors.lm1.temp0=48.00 degC
hw.sensors.lm1.temp1=52.50 degC
hw.sensors.lm1.temp2=36.00 degC
# sysctl -a|grep 'sensors.*temp'
hw.sensors.cpu0.temp0=30.00 degC
hw.sensors.lm1.temp0=48.00 degC
hw.sensors.lm1.temp1=52.50 degC
hw.sensors.lm1.temp2=36.00 degC
# sysctl -a|grep 'sensors.*temp'
hw.sensors.cpu0.temp0=31.00 degC
hw.sensors.lm1.temp0=48.00 degC
hw.sensors.lm1.temp1=52.50 degC
hw.sensors.lm1.temp2=36.00 degC
# reboot

and meet with success ... if i wait just a few minutes (2) i end up with this:

# sysctl -a|grep 'sensors.*temp'
hw.sensors.cpu0.temp0=30.00 degC
hw.sensors.lm1.temp0=48.00 degC
hw.sensors.lm1.temp1=52.00 degC
hw.sensors.lm1.temp2=35.50 degC
# sysctl -a|grep 'sensors.*temp'
hw.sensors.cpu0.temp0=30.00 degC
hw.sensors.lm1.temp0=0.00 degC
hw.sensors.lm1.temp1=14.00 degC
hw.sensors.lm1.temp2=14.00 degC
# sysctl -a|grep 'sensors.*temp'
hw.sensors.cpu0.temp0=30.00 degC
hw.sensors.lm1.temp0=0.00 degC
hw.sensors.lm1.temp1=14.00 degC
hw.sensors.lm1.temp2=14.00 degC
# sysctl -a|grep 'sensors.*temp'
hw.sensors.cpu0.temp0=30.00 degC
hw.sensors.lm1.temp0=0.00 degC
hw.sensors.lm1.temp1=14.00 degC
hw.sensors.lm1.temp2=14.00 degC
# sysctl -a|grep 'sensors.*temp'
hw.sensors.cpu0.temp0=30.00 degC
hw.sensors.lm1.temp0=0.00 degC
hw.sensors.lm1.temp1=14.00 degC
hw.sensors.lm1.temp2=14.00 degC
# sysctl -a|grep 'sensors.*temp'
hw.sensors.cpu0.temp0=30.00 degC
hw.sensors.lm1.temp0=0.00 degC
hw.sensors.lm1.temp1=14.00 degC
hw.sensors.lm1.temp2=14.00 degC
# reboot

BEEEP!

again, not a very scientific/exacting approach, but half a dozen times i've
seen the same results. i don't know what it is that trips up the sensors,
but that's when i seem to have the issue.

now, this is running the 5.4 installation (i downgraded at someone's
suggestion for testing) and i can easily reinstall from current snapshot to
see if this may be an unrelated bug.

but until then, does this scenario make sense to anyone?



Re: requesting help working around boot failures with supermicro atom board

2015-09-14 Thread Mark Kettenis
> # sysctl -a|grep 'sensors.*temp'
> hw.sensors.cpu0.temp0=30.00 degC
> hw.sensors.lm1.temp0=0.00 degC
> hw.sensors.lm1.temp1=14.00 degC
> hw.sensors.lm1.temp2=14.00 degC
> # reboot
> 
> BEEEP!

Oh that is interesting.  Can you try disabling the lm(4) driver in
your kernel?  You can do:

# config -ef /bsd
...
ukc> disable lm
254 lm0 disabled
255 lm* disabled
256 lm* disabled
ukc> quit
Saving modified kernel.
# reboot

That reboot will probably still hang.  But it'd be interesting to see
if any subsequent reboots work better.



Re: requesting help working around boot failures with supermicro atom board

2015-09-14 Thread Dewey Hylton
Mike Larkin  azathoth.net> writes:

> 
> On Fri, Sep 11, 2015 at 06:38:23PM -0400, dewey.hylton  gmail.com wrote:
> > hi all. i???m having difficulty with this board:
> > 
> > Supermicro X7SPE-HD-D525 rev1
> > 
> > i have several similar systems, each running an older version of OpenBSD
for a few years without incident.
> except this one ???
> > 
> > running OpenBSD 5.7 i386, from cold start it boots just fine and runs
until rebooted. once rebooted,
> however, prior to anything being displayed (i assume this is early in the
bios post phase) i get one very
> long beep. super micro tells me this indicates inability to correctly
initialize the memory. okay, so
> i???ve changed memory for known working components and have the same
issue. at this point, the only thing
> that gets me booting again is to remove power and then restore power. it
then boots fine from cold start, and
> fails on the next reboot (as in, ???reboot??? from the command line). once
in long-beep failure mode,
> neither the hardware reset button nor the power button can make the
machine boot again. the only thing that
> works is removing power. every once in a while it will reboot s
>  uccessfully, only to fail in the same manner on the next attempt.
> > 
> > super micro has had me flash bios, clear cmos, boot from different
devices and with nothing connected,
> etc. the results are the same: when rebooting from openbsd, next boot
fails until power is
> removed/restored. super micro blames openbsd.
> > 
> > i installed linux (same hardware, overwrite openbsd 5.7) and scheduled a
reboot every 5 minutes and left
> it overnight. i logged 554 successful reboots.
> > 
> > i have since installed the latest available openbsd amd64 snapshot, and
am seeing the same failures.
> > 
> > i???m wondering if something could be disabled (boot -c ?) or if
something else raises a red flag and might
> have a workaround. this has me stumped. i would very much appreciate a
clue stick. 
> > 
> > dmesg follows:
> > 
> 
> acpidump please.

my pleasure:

[demime removed a uuencoded section named supermicro-X7SPE-HF-D525-acpidump.tgz 
which was 276 lines]



Re: requesting help working around boot failures with supermicro atom board

2015-09-14 Thread Dewey Hylton
Dewey Hylton  gmail.com> writes:



> # sysctl -a|grep 'sensors.*temp'
> hw.sensors.cpu0.temp0=31.00 degC
> hw.sensors.lm1.temp0=48.00 degC
> hw.sensors.lm1.temp1=52.50 degC
> hw.sensors.lm1.temp2=36.00 degC
> # reboot
> 
> and meet with success ... if i wait just a few minutes (2) i end up with this:
> 
> # sysctl -a|grep 'sensors.*temp'
> hw.sensors.cpu0.temp0=30.00 degC
> hw.sensors.lm1.temp0=48.00 degC
> hw.sensors.lm1.temp1=52.00 degC
> hw.sensors.lm1.temp2=35.50 degC
> # sysctl -a|grep 'sensors.*temp'
> hw.sensors.cpu0.temp0=30.00 degC
> hw.sensors.lm1.temp0=0.00 degC
> hw.sensors.lm1.temp1=14.00 degC
> hw.sensors.lm1.temp2=14.00 degC
> # sysctl -a|grep 'sensors.*temp'
> hw.sensors.cpu0.temp0=30.00 degC
> hw.sensors.lm1.temp0=0.00 degC
> hw.sensors.lm1.temp1=14.00 degC
> hw.sensors.lm1.temp2=14.00 degC
> # sysctl -a|grep 'sensors.*temp'
> hw.sensors.cpu0.temp0=30.00 degC
> hw.sensors.lm1.temp0=0.00 degC
> hw.sensors.lm1.temp1=14.00 degC
> hw.sensors.lm1.temp2=14.00 degC
> # sysctl -a|grep 'sensors.*temp'
> hw.sensors.cpu0.temp0=30.00 degC
> hw.sensors.lm1.temp0=0.00 degC
> hw.sensors.lm1.temp1=14.00 degC
> hw.sensors.lm1.temp2=14.00 degC
> # sysctl -a|grep 'sensors.*temp'
> hw.sensors.cpu0.temp0=30.00 degC
> hw.sensors.lm1.temp0=0.00 degC
> hw.sensors.lm1.temp1=14.00 degC
> hw.sensors.lm1.temp2=14.00 degC
> # reboot
> 
> BEEEP!
> 
> again, not a very scientific/exacting approach, but half a dozen times i've
> seen the same results. i don't know what it is that trips up the sensors,
> but that's when i seem to have the issue.
> 
> now, this is running the 5.4 installation (i downgraded at someone's
> suggestion for testing) and i can easily reinstall from current snapshot to
> see if this may be an unrelated bug.
> 
> but until then, does this scenario make sense to anyone?

i now have a fresh install of current/amd64. the snapshot appears to be a
bit on the broken side, as some of the libcrypto/libssl stuff is missing
(this on both i386 and amd64 snapshots) and this prevents me from logging in
via ssh and copy/pasting from terminal. i see this has been reported on the
list already, so a newer snapshot (tomorrow?) may fix this.

but still i have several -current boots under my belt, and the sysctl
temperature thing appears to be similar to what it was with the 5.4
installation. the lm temps are showing negative numbers instead of 0 and 14,
but once that happens the box fails to reboot properly as before.

one other thing i've noticed now that i've reinstalled so many times in the
past few days: it does not matter how long i've been booted into the ramdisk
kernel for installation or whatever - it can sit for hours, and always
reboots properly. no support for sensors in the ramdisk kernel. coincidence?



Re: requesting help working around boot failures with supermicro atom board

2015-09-14 Thread Dewey Hylton
Mark Kettenis  xs4all.nl> writes:

> 
> > # sysctl -a|grep 'sensors.*temp'
> > hw.sensors.cpu0.temp0=30.00 degC
> > hw.sensors.lm1.temp0=0.00 degC
> > hw.sensors.lm1.temp1=14.00 degC
> > hw.sensors.lm1.temp2=14.00 degC
> > # reboot
> > 
> > BEEEP!
> 
> Oh that is interesting.  Can you try disabling the lm(4) driver in
> your kernel?  You can do:
> 
> # config -ef /bsd
> ...
> ukc> disable lm
> 254 lm0 disabled
> 255 lm* disabled
> 256 lm* disabled
> ukc> quit
> Saving modified kernel.
> # reboot
> 
> That reboot will probably still hang.  But it'd be interesting to see
> if any subsequent reboots work better.

*this* interests me, and was basically what i was asking in the original
post - except i had no idea what might need to be disabled. one step at a
time, it's been interesting the things that have popped up.

still no idea whether this has anything to do with the seemingly
openbsd-only issue, but ...

i made this change, booted the new kernel, ran 'cksum /dev/mem' a bunch of
times in hopes of raising the temperature somewhat (did get to 36C, which is
higher than in my previous tests). then i rebooted, and the box came back up
without incident.

so i'm going to run through this several times with reboots in every 20
minutes or so and see if it survives the night.



Re: requesting help working around boot failures with supermicro atom board

2015-09-14 Thread Chris Cappuccio
Kurt Mosiejczuk [kurt-open...@se.rit.edu] wrote:
> 
> Hearing that Linux doesn't trip it, I'm wondering if it's an ACPI difference
> between OpenBSD and Linux.  Perhaps OpenBSD runs the CPU hotter before
> turning it back over to the BIOS on reboot?
> 

OpenBSD 5.8-current enters deeper C states than the ACPI describes. The
ACPI documentation is not to be taken literally, according to Intel. So
now OpenBSD's behavior should be similar to Linux in this regard, and
therefore, you'll see lower CPU temperatures. (With the C states and 
mwait features that are enabled today, your temps are already lower
than previous releases.)



Re: requesting help working around boot failures with supermicro atom board

2015-09-14 Thread Kurt Mosiejczuk
On Mon, Sep 14, 2015 at 05:15:01PM +, Dewey Hylton wrote:

> > I've had this issue with the same systems.  Never guessed it would
> > be OpenBSD specific.  What I've found to make it stop happening is
> > pulling the board out and redoing the thermal paste for the CPU
> > heatsink.  I had found some reference indicating that the alarm I
> > got might be because of overheating.

> > The difference between the boxes may be the attention to detail the
> > factory worker who put it together had that day.

> > Hearing that Linux doesn't trip it, I'm wondering if it's an ACPI
> > difference between OpenBSD and Linux.  Perhaps OpenBSD runs the CPU
> > hotter before turning it back over to the BIOS on reboot?

> this is great information; thanks. any idea where the temperature
> reference can be found?

I don't remember, it was at least a couple years ago.  It was only one
reference too.  Most talked about listening to beep codes, which this
wasn't really beep codes...

> i may be able to log the cpu temperature in both operating systems in
> order to compare ...

Possibly, but noticed I said "before turning it back over to the BIOS".
If it's a difference in OS shutdown, it will be difficult to log the
temperature.

--Kurt



Re: requesting help working around boot failures with supermicro atom board

2015-09-14 Thread Mike Larkin
On Fri, Sep 11, 2015 at 06:38:23PM -0400, dewey.hyl...@gmail.com wrote:
> hi all. i???m having difficulty with this board:
> 
> Supermicro X7SPE-HD-D525 rev1
> 
> i have several similar systems, each running an older version of OpenBSD for 
> a few years without incident. except this one ???
> 
> running OpenBSD 5.7 i386, from cold start it boots just fine and runs until 
> rebooted. once rebooted, however, prior to anything being displayed (i assume 
> this is early in the bios post phase) i get one very long beep. super micro 
> tells me this indicates inability to correctly initialize the memory. okay, 
> so i???ve changed memory for known working components and have the same 
> issue. at this point, the only thing that gets me booting again is to remove 
> power and then restore power. it then boots fine from cold start, and fails 
> on the next reboot (as in, ???reboot??? from the command line). once in 
> long-beep failure mode, neither the hardware reset button nor the power 
> button can make the machine boot again. the only thing that works is removing 
> power. every once in a while it will reboot successfully, only to fail in the 
> same manner on the next attempt.
> 
> super micro has had me flash bios, clear cmos, boot from different devices 
> and with nothing connected, etc. the results are the same: when rebooting 
> from openbsd, next boot fails until power is removed/restored. super micro 
> blames openbsd.
> 
> i installed linux (same hardware, overwrite openbsd 5.7) and scheduled a 
> reboot every 5 minutes and left it overnight. i logged 554 successful reboots.
> 
> i have since installed the latest available openbsd amd64 snapshot, and am 
> seeing the same failures.
> 
> i???m wondering if something could be disabled (boot -c ?) or if something 
> else raises a red flag and might have a workaround. this has me stumped. i 
> would very much appreciate a clue stick. 
> 
> dmesg follows:
> 

acpidump please.

> OpenBSD 5.8-current (GENERIC.MP) #1364: Wed Sep  9 17:32:01 MDT 2015
> dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> real mem = 4277665792 (4079MB)
> avail mem = 4144070656 (3952MB)
> mpath0 at root
> scsibus0 at mpath0: 256 targets
> mainbus0 at root
> bios0 at mainbus0
> acpi0 at bios0: rev 2
> acpi0: sleep states S0 S1 S4 S5
> acpi0: tables DSDT FACP APIC MCFG OEMB HPET EINJ BERT ERST HEST
> acpi0: wakeup devices P0P1(S4) PS2K(S4) PS2M(S4) USB0(S4) USB1(S4) USB2(S4) 
> USB5(S4) EUSB(S4) USB3(S4) USB4(S4) USB6(S4) USBE(S4) P0P4(S4) P0P5(S4) 
> P0P6(S4) P0P7(S4) [...]
> acpitimer0 at acpi0: 3579545 Hz, 24 bits
> acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
> cpu0 at mainbus0: apid 0 (boot processor)
> cpu0: Intel(R) Atom(TM) CPU D525 @ 1.80GHz, 1800.23 MHz
> cpu0: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,DTES64,MWAIT,DS-CPL,TM2,SSSE3,CX16,xTPR,PDCM,MOVBE,NXE,LONG,LAHF,PERF,SENSOR
> cpu0: 512KB 64b/line 8-way L2 cache
> cpu0: smt 0, core 0, package 0
> mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges
> cpu0: apic clock running at 199MHz
> cpu0: mwait min=64, max=64, C-substates=0.1, IBE
> cpu1 at mainbus0: apid 2 (application processor)
> cpu1: Intel(R) Atom(TM) CPU D525 @ 1.80GHz, 1800.00 MHz
> cpu1: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,DTES64,MWAIT,DS-CPL,TM2,SSSE3,CX16,xTPR,PDCM,MOVBE,NXE,LONG,LAHF,PERF,SENSOR
> cpu1: 512KB 64b/line 8-way L2 cache
> cpu1: smt 0, core 1, package 0
> cpu2 at mainbus0: apid 1 (application processor)
> cpu2: Intel(R) Atom(TM) CPU D525 @ 1.80GHz, 1800.00 MHz
> cpu2: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,DTES64,MWAIT,DS-CPL,TM2,SSSE3,CX16,xTPR,PDCM,MOVBE,NXE,LONG,LAHF,PERF,SENSOR
> cpu2: 512KB 64b/line 8-way L2 cache
> cpu2: smt 1, core 0, package 0
> cpu3 at mainbus0: apid 3 (application processor)
> cpu3: Intel(R) Atom(TM) CPU D525 @ 1.80GHz, 1800.00 MHz
> cpu3: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,DTES64,MWAIT,DS-CPL,TM2,SSSE3,CX16,xTPR,PDCM,MOVBE,NXE,LONG,LAHF,PERF,SENSOR
> cpu3: 512KB 64b/line 8-way L2 cache
> cpu3: smt 1, core 1, package 0
> ioapic0 at mainbus0: apid 4 pa 0xfec0, version 20, 24 pins
> ioapic0: misconfigured as apic 1, remapped to apid 4
> acpimcfg0 at acpi0 addr 0xe000, bus 0-255
> acpihpet0 at acpi0: 14318179 Hz
> acpiprt0 at acpi0: bus 0 (PCI0)
> acpiprt1 at acpi0: bus 4 (P0P1)
> acpiprt2 at acpi0: bus 1 (P0P4)
> acpiprt3 at acpi0: bus 2 (P0P8)
> acpiprt4 at acpi0: bus 3 (P0P9)
> acpicpu0 at acpi0: C1(@1 halt!)
> acpicpu1 at acpi0: C1(@1 halt!)
> acpicpu2 at acpi0: C1(@1 halt!)
> acpicpu3 at acpi0: C1(@1 halt!)
> acpibtn0 at acpi0: SLPB
> acpibtn1 at acpi0: PWRB
> pci0 at mainbus0 bus 0
> pchb0 at pci0 dev 0 function 0 

Re: requesting help working around boot failures with supermicro atom board

2015-09-13 Thread Sonic
On Sun, Sep 13, 2015 at 10:15 AM, Sonic  wrote:
> I also have this issue with OpenBSD on this box. Every time I reboot
> after updating a snapshot I need to power cycle to eliminate the long
> beep error. For some reason I kept thinking it was due to my replacing
> the stock PSU with a picoPSU for silent operation as a BIOS upgrade
> did not solve the issue. Never had this problem with the previous
> generation D510 based systems, only this D525 based version.

My mistake - the board I have trouble with is the X7SPE-HF-D525 and
not the X7SPA-HF-D525.



Re: requesting help working around boot failures with supermicro atom board

2015-09-13 Thread Sonic
My mistake - the board I have trouble with is the X7SPE-HF-D525 and
not the X7SPA-HF-D525.

On Sun, Sep 13, 2015 at 11:23 AM, Sonic  wrote:
> On Sun, Sep 13, 2015 at 10:15 AM, Sonic  wrote:
>> I also have this issue with OpenBSD on this box. Every time I reboot
>> after updating a snapshot I need to power cycle to eliminate the long
>> beep error. For some reason I kept thinking it was due to my replacing
>> the stock PSU with a picoPSU for silent operation as a BIOS upgrade
>> did not solve the issue. Never had this problem with the previous
>> generation D510 based systems, only this D525 based version.
>
> My mistake - the board I have trouble with is the X7SPE-HF-D525 and
> not the X7SPA-HF-D525.



Re: requesting help working around boot failures with supermicro atom board

2015-09-13 Thread Mark Patruck
Never had any reboot issues on two X7SPA-HF-D525 (Bios R1.2b) and i'm
updating/rebooting pretty often the last few weeks.

On Sun, Sep 13, 2015 at 10:15:31AM -0400, Sonic wrote:
> On Sat, Sep 12, 2015 at 11:02 PM,   wrote:
> > X7SPA-HF-D525
> 
> I also have this issue with OpenBSD on this box. Every time I reboot
> after updating a snapshot I need to power cycle to eliminate the long
> beep error. For some reason I kept thinking it was due to my replacing
> the stock PSU with a picoPSU for silent operation as a BIOS upgrade
> did not solve the issue. Never had this problem with the previous
> generation D510 based systems, only this D525 based version.
> 
> CHris
> 

-- 
Mark Patruck ( mark at wrapped.cx )
GPG key 0xF2865E51 / 187F F6D3 EE04 1DCE 1C74  F644 0D3C F66F F286 5E51

http://www.wrapped.cx



Re: requesting help working around boot failures with supermicro atom board

2015-09-13 Thread Sonic
On Sat, Sep 12, 2015 at 11:02 PM,   wrote:
> X7SPA-HF-D525

I also have this issue with OpenBSD on this box. Every time I reboot
after updating a snapshot I need to power cycle to eliminate the long
beep error. For some reason I kept thinking it was due to my replacing
the stock PSU with a picoPSU for silent operation as a BIOS upgrade
did not solve the issue. Never had this problem with the previous
generation D510 based systems, only this D525 based version.

CHris



Re: requesting help working around boot failures with supermicro atom board

2015-09-13 Thread Patrick Dohman
Any thermal settings in the bios? CPU performance, Fan Speed etc..

Does the fan idle correctly? Often intel chipsets will throttle the fan during 
a bios test.

Perhaps ACPI is not routing an interrupt??

Regards
Patrick


> On Sep 11, 2015, at 5:38 PM, dewey.hyl...@gmail.com wrote:
> 
> hi all. i’m having difficulty with this board:
> 
> Supermicro X7SPE-HD-D525 rev1
> 
> i have several similar systems, each running an older version of OpenBSD for 
> a few years without incident. except this one …
> 
> running OpenBSD 5.7 i386, from cold start it boots just fine and runs until 
> rebooted. once rebooted, however, prior to anything being displayed (i assume 
> this is early in the bios post phase) i get one very long beep. super micro 
> tells me this indicates inability to correctly initialize the memory. okay, 
> so i’ve changed memory for known working components and have the same issue. 
> at this point, the only thing that gets me booting again is to remove power 
> and then restore power. it then boots fine from cold start, and fails on the 
> next reboot (as in, “reboot” from the command line). once in long-beep 
> failure mode, neither the hardware reset button nor the power button can make 
> the machine boot again. the only thing that works is removing power. every 
> once in a while it will reboot successfully, only to fail in the same manner 
> on the next attempt.
> 
> super micro has had me flash bios, clear cmos, boot from different devices 
> and with nothing connected, etc. the results are the same: when rebooting 
> from openbsd, next boot fails until power is removed/restored. super micro 
> blames openbsd.
> 
> i installed linux (same hardware, overwrite openbsd 5.7) and scheduled a 
> reboot every 5 minutes and left it overnight. i logged 554 successful reboots.
> 
> i have since installed the latest available openbsd amd64 snapshot, and am 
> seeing the same failures.
> 
> i’m wondering if something could be disabled (boot -c ?) or if something else 
> raises a red flag and might have a workaround. this has me stumped. i would 
> very much appreciate a clue stick. 
> 
> dmesg follows:
> 
> OpenBSD 5.8-current (GENERIC.MP) #1364: Wed Sep  9 17:32:01 MDT 2015
>dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> real mem = 4277665792 (4079MB)
> avail mem = 4144070656 (3952MB)
> mpath0 at root
> scsibus0 at mpath0: 256 targets
> mainbus0 at root
> bios0 at mainbus0
> acpi0 at bios0: rev 2
> acpi0: sleep states S0 S1 S4 S5
> acpi0: tables DSDT FACP APIC MCFG OEMB HPET EINJ BERT ERST HEST
> acpi0: wakeup devices P0P1(S4) PS2K(S4) PS2M(S4) USB0(S4) USB1(S4) USB2(S4) 
> USB5(S4) EUSB(S4) USB3(S4) USB4(S4) USB6(S4) USBE(S4) P0P4(S4) P0P5(S4) 
> P0P6(S4) P0P7(S4) [...]
> acpitimer0 at acpi0: 3579545 Hz, 24 bits
> acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
> cpu0 at mainbus0: apid 0 (boot processor)
> cpu0: Intel(R) Atom(TM) CPU D525 @ 1.80GHz, 1800.23 MHz
> cpu0: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,DTES64,MWAIT,DS-CPL,TM2,SSSE3,CX16,xTPR,PDCM,MOVBE,NXE,LONG,LAHF,PERF,SENSOR
> cpu0: 512KB 64b/line 8-way L2 cache
> cpu0: smt 0, core 0, package 0
> mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges
> cpu0: apic clock running at 199MHz
> cpu0: mwait min=64, max=64, C-substates=0.1, IBE
> cpu1 at mainbus0: apid 2 (application processor)
> cpu1: Intel(R) Atom(TM) CPU D525 @ 1.80GHz, 1800.00 MHz
> cpu1: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,DTES64,MWAIT,DS-CPL,TM2,SSSE3,CX16,xTPR,PDCM,MOVBE,NXE,LONG,LAHF,PERF,SENSOR
> cpu1: 512KB 64b/line 8-way L2 cache
> cpu1: smt 0, core 1, package 0
> cpu2 at mainbus0: apid 1 (application processor)
> cpu2: Intel(R) Atom(TM) CPU D525 @ 1.80GHz, 1800.00 MHz
> cpu2: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,DTES64,MWAIT,DS-CPL,TM2,SSSE3,CX16,xTPR,PDCM,MOVBE,NXE,LONG,LAHF,PERF,SENSOR
> cpu2: 512KB 64b/line 8-way L2 cache
> cpu2: smt 1, core 0, package 0
> cpu3 at mainbus0: apid 3 (application processor)
> cpu3: Intel(R) Atom(TM) CPU D525 @ 1.80GHz, 1800.00 MHz
> cpu3: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,DTES64,MWAIT,DS-CPL,TM2,SSSE3,CX16,xTPR,PDCM,MOVBE,NXE,LONG,LAHF,PERF,SENSOR
> cpu3: 512KB 64b/line 8-way L2 cache
> cpu3: smt 1, core 1, package 0
> ioapic0 at mainbus0: apid 4 pa 0xfec0, version 20, 24 pins
> ioapic0: misconfigured as apic 1, remapped to apid 4
> acpimcfg0 at acpi0 addr 0xe000, bus 0-255
> acpihpet0 at acpi0: 14318179 Hz
> acpiprt0 at acpi0: bus 0 (PCI0)
> acpiprt1 at acpi0: bus 4 (P0P1)
> acpiprt2 at acpi0: bus 1 (P0P4)
> acpiprt3 at acpi0: bus 2 (P0P8)
> acpiprt4 at acpi0: bus 3 (P0P9)
> acpicpu0 at acpi0: C1(@1 halt!)
> acpicpu1 at acpi0: C1(@1 

Re: requesting help working around boot failures with supermicro atom board

2015-09-13 Thread lists
> i have indeed disabled quick/quiet boot options to no avail. i've also tried
> failsafe mode, ide vs ahci, acpi v1/2/3. the issue does not present with
> linux, which makes me wonder whether the openbsd kernel is somehow making
> some kind of hardware setting change that is not cleared on reboot. despite
> this only presenting in openbsd, i still blame hardware - but am hoping
> there might be some openbsd-related tweak.
> 
> thanks for the idea.
> 
> 

Hi Dewey,

On my X7SPA-HF-D525 system quick boot has never been enabled, and
it's set to AHCI mode and it never mattered the storage device, i.e.
independent of USB flash stick or HDD disk (been running both for long
periods), the problem has been manifesting itself over the years.

What I have to make clear is that not every reboot leads to this
condition here, only if the system has been running for a considerable
longer time. Typically I run the system for a while (as long as
possible / needed) between snapshot upgrades as it's in use 24/7
behind a true sine wave UPS. I have ruled out power supply, memory and
there is no periphery.

After the system has been running for while I usually download the sets
from a mirror and rsync them to local storage, then issue a reboot.
There is a pretty high chance the system will NOT boot at all as
you're reporting exactly, but it does go into the reboot process OK
cleanly exiting the OS and doing a reset. It goes into the early stages
of the POST and can not complete it, but the system passes is
accessible over the IPMI BMC and can be power cycled etc over IPMI over
LAN util, and via the web based interface on the BMC as well.

The system can not boot up properly once it enters this condition, since
on an IPMI power cycle or off the POST goes into long beep (~5-7s)
silend (~1s) repeat long beep / silence pattern that means memory
error, but it's not the memory's fault.

The IPMI can not be used to reset the system, only to power off/on or
power cycle in this condition. My most critical presumption is that it
is a BIOS POST or an IPMI hook related to the BIOS post however, and
would like that further taken with Supermicro if the OS factor is ruled
out as well.

The system can only be brought back by the PSU breaker switch or power
(mains) cable disconnect / reconnect for 5s.

Once up the system boots, passes through the upgrade OK, can be
rebooted and the problem IS NOT present. Several reboots work OK,
tested, so it's not caused by the OS unclean exit, it works several
reboot cycles / upgrades etc... until you leave it running for a longer
period of time. This is what you may be seeing with the Linux reboot
cycle test script.

The system runs for a long while no issues, and after getting it
rebooted no matter how, over SSH, local KBD, serial cable, or serial
over LAN (Ethernet) IPMI tool, or the IPMI web based tool, it gets into
this flawed state where it can not pass the POST. So, the system is a
total fail for locating at a data centre without a PDU unit with real
disconnect feature.

I have never ran Linux on this box and can not do so (live system in
production, no spares or budget for this), but I would recommend that
you try and see if it makes a difference over a longer run with Linux
and see if you can trigger this happening independent of the OS.

The most important issue for me is to know if it is OS dependent or
not, as this will be very valuable in bringing it back to Supermicro,
or alternatively comparing the reboot state between OpenBSD and another
OS.

Thank you for your tests and perseverance on this, much appreciated.

Regards,
Anton



Re: requesting help working around boot failures with supermicro atom board

2015-09-13 Thread Dewey Hylton
Sonic  gmail.com> writes:

> On Sun, Sep 13, 2015 at 10:15 AM, Sonic  gmail.com> wrote:
> > I also have this issue with OpenBSD on this box. Every time I reboot
> > after updating a snapshot I need to power cycle to eliminate the long
> > beep error. For some reason I kept thinking it was due to my replacing
> > the stock PSU with a picoPSU for silent operation as a BIOS upgrade
> > did not solve the issue. Never had this problem with the previous
> > generation D510 based systems, only this D525 based version.
> 
> My mistake - the board I have trouble with is the X7SPE-HF-D525 and
> not the X7SPA-HF-D525.

this is the same board i have (X7SPE-HF-D525). what board and bios revision
do you have? my board is rev 1.0 and bios is 1.2b.



Re: requesting help working around boot failures with supermicro atom board

2015-09-12 Thread lists
> Whether they are identical or not, showing us a dmesg diff with a known
> working release booted from both a working and the non-working system
> could also be helpful.

Another Supermicro X7SPA-HF-D525 board (same chipset/CPU combination)
has been having the same issue since early 2011 (the entire life span
of the system), always running a recent snapshot:

http://www.supermicro.com/products/motherboard/ATOM/ICH9/X7SPA-HF-D525.cfm

There is absolutely no sense in running a 2010 snapshot now, except for
experiments as suggested by Benny.

Please try and see if it makes a difference comparing reboots of the
system over SSH, on the local console, over serial console, and with
serial over LAN. I can provide test results over all these methods
given some tech suggestion / solution.

You may want to have a DMI / PCI / ACPI dumps, let me know how / if you
want these from my system too (attach brief newbie instructions please).

So far, nothing has solved it yet for me too, except power cycle via the
PSU breaker each time this happens. Never tried any other OS except
OpenBSD, thought it was the hardware (memory by the beep code) fault,
but it's not (confirmed with long runs of memtest). The system
runs for very long intervals without any other issues, except the
reboot behaviour in the original post, confirming same problem. Running
latest BIOS and IPMI firmwares.

Thanks, Dewey for testing this more extensively than I had the nerve to.



Re: requesting help working around boot failures with supermicro atom board

2015-09-12 Thread Dewey Hylton
Richard Laysell  xiphosura.co.uk> writes:

> 
> On Fri, 11 Sep 2015 18:38:23 -0400 (EDT)
> "dewey.hylton  gmail.com"  gmail.com> wrote:
> 
> > hi all. i’m having difficulty with this board:
> > 
> > Supermicro X7SPE-HD-D525 rev1
> > 
> > i have several similar systems, each running an older version of
> > OpenBSD for a few years without incident. except this one …
> > 
> 
> 
> Do you have Quick Boot enabled in the BIOS?  If so, try disabling it.
> 
> I have known this cause problems (on other boards - no experience with
> this one).  Quick Boot seems to do a quick and dirty setup and
> doesn't fully initialise all of the devices.  This may be why you are
> seeing it boot OK if you remove the power - the devices then either get
> reset to their default states or the BIOS has to set them up.
> 
> Regards,
> 
> Richard

i have indeed disabled quick/quiet boot options to no avail. i've also tried
failsafe mode, ide vs ahci, acpi v1/2/3. the issue does not present with
linux, which makes me wonder whether the openbsd kernel is somehow making
some kind of hardware setting change that is not cleared on reboot. despite
this only presenting in openbsd, i still blame hardware - but am hoping
there might be some openbsd-related tweak.

thanks for the idea.



Re: requesting help working around boot failures with supermicro atom board

2015-09-12 Thread Dewey Hylton
  wrant.com> writes:

> 
> > Whether they are identical or not, showing us a dmesg diff with a known
> > working release booted from both a working and the non-working system
> > could also be helpful.
> 
> Another Supermicro X7SPA-HF-D525 board (same chipset/CPU combination)
> has been having the same issue since early 2011 (the entire life span
> of the system), always running a recent snapshot:
> 
> http://www.supermicro.com/products/motherboard/ATOM/ICH9/X7SPA-HF-D525.cfm
> 
> There is absolutely no sense in running a 2010 snapshot now, except for
> experiments as suggested by Benny.
> 
> Please try and see if it makes a difference comparing reboots of the
> system over SSH, on the local console, over serial console, and with
> serial over LAN. I can provide test results over all these methods
> given some tech suggestion / solution.
> 
> You may want to have a DMI / PCI / ACPI dumps, let me know how / if you
> want these from my system too (attach brief newbie instructions please).
> 
> So far, nothing has solved it yet for me too, except power cycle via the
> PSU breaker each time this happens. Never tried any other OS except
> OpenBSD, thought it was the hardware (memory by the beep code) fault,
> but it's not (confirmed with long runs of memtest). The system
> runs for very long intervals without any other issues, except the
> reboot behaviour in the original post, confirming same problem. Running
> latest BIOS and IPMI firmwares.
> 
> Thanks, Dewey for testing this more extensively than I had the nerve to.

this is great information, and i've passed it along to supermicro as a "i'm
not the only one with this issue" datapoint.

since i'm apparently not alone, this looks more like a board/firmware design
issue - just curious why none of my other boards have this issue.

i'm not knowledgeable enough to provide newbie instructions on the dumps,
but if doing this makes sense to anyone here with more hardware experience
i'm certainly willing to try it.

regarding the comparison of reboots, do you mean testing reboots initiated
via different means (ssh/console/etc.)? if so, i've done that and ruled them
out. last test was a shell script with sleep/reboot commands which executed
via rc.local - meaning there was no user login prior to the failure.

thanks for this information.



Re: requesting help working around boot failures with supermicro atom board

2015-09-12 Thread Richard Laysell
On Fri, 11 Sep 2015 18:38:23 -0400 (EDT)
"dewey.hyl...@gmail.com"  wrote:

> hi all. i’m having difficulty with this board:
> 
> Supermicro X7SPE-HD-D525 rev1
> 
> i have several similar systems, each running an older version of
> OpenBSD for a few years without incident. except this one …
> 


Do you have Quick Boot enabled in the BIOS?  If so, try disabling it.

I have known this cause problems (on other boards - no experience with
this one).  Quick Boot seems to do a quick and dirty setup and
doesn't fully initialise all of the devices.  This may be why you are
seeing it boot OK if you remove the power - the devices then either get
reset to their default states or the BIOS has to set them up.

Regards,

Richard



Re: requesting help working around boot failures with supermicro atom board

2015-09-12 Thread Dewey Hylton
Benny Lofgren  lofgren.biz> writes:

> 
> Hi Dewey,
> 
> On 2015-09-12 00:38, dewey.hylton  gmail.com wrote:
> > hi all. i’m having difficulty with this board:
> 
> I noticed your mail somehow got posted twice, but I'm commenting on the
> first incarnation of it because the second had some characters like '\''
> mangled (UTF-8 copy/paste issue I presume).

i posted first via my normal zimbra server, which seemed to have gotten
hung up so i copied/pasted into the gmail web interface. i think the
second to show up may have been the original (via zimbra). no idea why
there's an issue. i only get daily digests via email so i'm posting this
via the gmane interface; if this doesn't work correctly i'll have to sit
down and figure out the best way to do this in the future.
> 
> > Supermicro X7SPE-HD-D525 rev1
> > i have several similar systems, each running an older version of OpenBSD
for a few years without incident.
> except this one …
> 
> You might already have tried this, but providing this information may
> give important clues to the rest of us trying to help you:
> 
> Since you say that your other similar systems are successfully running
> older versions of OpenBSD, have you tried running this new system with a
> version that you know works on the other boards?

i just installed 5.4 on this board, just as is running on its original
cluster mate (which is still running fine). same issue.

> 
> And if so, then have you tried moving on to subsequent versions in turn
> until you find the one which breaks? That is a really important piece of
> information.
> 
> Also, are those other systems "similar" or "identical"? If not
> identical, what differs? This is also important to get a grip on the
> problem.

these are identical in all ways, except for the current bios version
(which supermicro had me to update when troubleshooting). i'll attempt
to back it down to the same version present on the working board and 
see where that gets me.

> 
> Whether they are identical or not, showing us a dmesg diff with a known
> working release booted from both a working and the non-working system
> could also be helpful.

i'll post the diff below.
> 
> Regards,
> 
> /Benny

thanks for your input. i was shocked to find that linux didn't produce
this issue as well; i don't understand how a failure for a board to 
post could have anything to do with an os which is not running during
the post. that may just show how much i (don't) understand the hardware
and bios side of things.



Re: requesting help working around boot failures with supermicro atom board

2015-09-12 Thread Dewey Hylton
John E.P. Hynes  hytronix.com> writes:

> 
> Try booting the SP kernel and see if that works.  If it does, you might
> be running into a variant of an issie I've had on my SuperMicro boxen...
> 
> -John

john, i tried this (5.4 bsd.sp) and i'm seeing the same result. it didn't
occur to me to try this; thanks for the idea.



Re: requesting help working around boot failures with supermicro atom board

2015-09-12 Thread Dewey Hylton
Dewey Hylton  gmail.com> writes:

> > Whether they are identical or not, showing us a dmesg diff with a known
> > working release booted from both a working and the non-working system
> > could also be helpful.
> 
> i'll post the diff below.

the only real differences i see are:
1) bios revision
2) secondary disk attached to different sata port
3) sensors only present on working machine

i'm having a hard time finding the older bios on the supermicro site, so i'm
reaching out to them.

i'm ignoring the disk channel difference.

i'm looking into the sensors - haven't found that in the bios yet.

here's the diff:

$ diff inf1.dmesg.54i inf2.dmesg.54i.good
8,9c8,9
< bios0 at mainbus0: AT/286+ BIOS, date 07/19/13, BIOS32 rev. 0 @ 0xf0010,
SMBIOS rev. 2.6 @ 0x9ac00 (19 entries)
< bios0: vendor American Megatrends Inc. version "1.2b" date 07/19/13
---
> bios0 at mainbus0: AT/286+ BIOS, date 02/21/12, BIOS32 rev. 0 @ 0xf0010,
SMBIOS rev. 2.6 @ 0x9ac00 (19 entries)
> bios0: vendor American Megatrends Inc. version "1.2a" date 02/21/12
18c18
< cpu0: apic clock running at 200MHz
---
> cpu0: apic clock running at 199MHz
20c20
< cpu1: Intel(R) Atom(TM) CPU D525 @ 1.80GHz ("GenuineIntel" 686-class) 1.81 GHz
---
> cpu1: Intel(R) Atom(TM) CPU D525 @ 1.80GHz ("GenuineIntel" 686-class) 1.80 GHz
23c23
< cpu2: Intel(R) Atom(TM) CPU D525 @ 1.80GHz ("GenuineIntel" 686-class) 1.81 GHz
---
> cpu2: Intel(R) Atom(TM) CPU D525 @ 1.80GHz ("GenuineIntel" 686-class) 1.80 GHz
26c26
< cpu3: Intel(R) Atom(TM) CPU D525 @ 1.80GHz ("GenuineIntel" 686-class) 1.81 GHz
---
> cpu3: Intel(R) Atom(TM) CPU D525 @ 1.80GHz ("GenuineIntel" 686-class) 1.80 GHz
43c43
< bios0: ROM list: 0xc/0x8000 0xc8000/0x1000
---
> bios0: ROM list: 0xc/0x8000
57c57
< em0 at pci2 dev 0 function 0 "Intel 82574L" rev 0x00: msi, address
0c:c4:7a:54:90:8e
---
> em0 at pci2 dev 0 function 0 "Intel 82574L" rev 0x00: msi, address
00:25:90:97:49:e0
60c60
< em1 at pci3 dev 0 function 0 "Intel 82574L" rev 0x00: msi, address
0c:c4:7a:54:90:8f
---
> em1 at pci3 dev 0 function 0 "Intel 82574L" rev 0x00: msi, address
00:25:90:97:49:e1
75c75
< sd0 at scsibus0 targ 0 lun 0:  SCSI3 0/direct
fixed naa.500a07510920882c
---
> sd0 at scsibus0 targ 0 lun 0:  SCSI3 0/direct
fixed naa.500a075109208807
77c77
< sd1 at scsibus0 targ 2 lun 0:  SCSI3 0/direct
fixed naa.50025388500930cc
---
> sd1 at scsibus0 targ 5 lun 0:  SCSI3 0/direct
fixed naa.50025388a01274c6
107a108
> lm2 at wbsio0 port 0xca0/8: W83627DHG
109a111
> lm1: disabling sensors due to alias with lm2
118a121,138
> uhub8 at uhub5 port 1 "ATEN International product 0x8021" rev 1.10/1.00 addr 2
> uhidev2 at uhub8 port 1 configuration 1 interface 0 "ATEN International
Co. Ltd GCS1808 V3.2.313" rev 1.10/1.00 addr 3
> uhidev2: iclass 3/1
> ukbd1 at uhidev2: 8 variable keys, 6 key codes
> wskbd2 at ukbd1 mux 1
> wskbd2: connecting to wsdisplay0
> uhidev3 at uhub8 port 1 configuration 1 interface 1 "ATEN International
Co. Ltd GCS1808 V3.2.313" rev 1.10/1.00 addr 3
> uhidev3: iclass 3/1, 2 report ids
> uhid0 at uhidev3 reportid 1: input=2, output=0, feature=0
> uhid1 at uhidev3 reportid 2: input=1, output=0, feature=0
> uhidev4 at uhub8 port 1 configuration 1 interface 2 "ATEN International
Co. Ltd GCS1808 V3.2.313" rev 1.10/1.00 addr 3
> uhidev4: iclass 3/1
> ums1 at uhidev4: 5 buttons, Z dir
> wsmouse1 at ums1 mux 0
> uhidev5 at uhub8 port 1 configuration 1 interface 3 "ATEN International
Co. Ltd GCS1808 V3.2.313" rev 1.10/1.00 addr 3
> uhidev5: iclass 3/1
> ums2 at uhidev5: 3 buttons, Z dir
> wsmouse2 at ums2 mux 0
123c143,147
< root on sd0a (22cb25880c08c19f.a) swap on sd0b dump on sd0b
---
> root on sd0a (3a0229e574e4bfd1.a) swap on sd0b dump on sd0b



Re: requesting help working around boot failures with supermicro atom board

2015-09-12 Thread Benny Lofgren
Hi Dewey,

On 2015-09-12 00:38, dewey.hyl...@gmail.com wrote:
> hi all. i’m having difficulty with this board:

I noticed your mail somehow got posted twice, but I'm commenting on the
first incarnation of it because the second had some characters like '\''
mangled (UTF-8 copy/paste issue I presume).

> Supermicro X7SPE-HD-D525 rev1
> i have several similar systems, each running an older version of OpenBSD for 
> a few years without incident. except this one …

You might already have tried this, but providing this information may
give important clues to the rest of us trying to help you:

Since you say that your other similar systems are successfully running
older versions of OpenBSD, have you tried running this new system with a
version that you know works on the other boards?

And if so, then have you tried moving on to subsequent versions in turn
until you find the one which breaks? That is a really important piece of
information.

Also, are those other systems "similar" or "identical"? If not
identical, what differs? This is also important to get a grip on the
problem.

Whether they are identical or not, showing us a dmesg diff with a known
working release booted from both a working and the non-working system
could also be helpful.


Regards,

/Benny

> 
> running OpenBSD 5.7 i386, from cold start it boots just fine and runs until 
> rebooted. once rebooted, however, prior to anything being displayed (i assume 
> this is early in the bios post phase) i get one very long beep. super micro 
> tells me this indicates inability to correctly initialize the memory. okay, 
> so i’ve changed memory for known working components and have the same issue. 
> at this point, the only thing that gets me booting again is to remove power 
> and then restore power. it then boots fine from cold start, and fails on the 
> next reboot (as in, “reboot” from the command line). once in long-beep 
> failure mode, neither the hardware reset button nor the power button can make 
> the machine boot again. the only thing that works is removing power. every 
> once in a while it will reboot successfully, only to fail in the same manner 
> on the next attempt.
> 
> super micro has had me flash bios, clear cmos, boot from different devices 
> and with nothing connected, etc. the results are the same: when rebooting 
> from openbsd, next boot fails until power is removed/restored. super micro 
> blames openbsd.
> 
> i installed linux (same hardware, overwrite openbsd 5.7) and scheduled a 
> reboot every 5 minutes and left it overnight. i logged 554 successful reboots.
> 
> i have since installed the latest available openbsd amd64 snapshot, and am 
> seeing the same failures.
> 
> i’m wondering if something could be disabled (boot -c ?) or if something else 
> raises a red flag and might have a workaround. this has me stumped. i would 
> very much appreciate a clue stick. 
> 
> dmesg follows:
> 
> OpenBSD 5.8-current (GENERIC.MP) #1364: Wed Sep  9 17:32:01 MDT 2015
> dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> real mem = 4277665792 (4079MB)
> avail mem = 4144070656 (3952MB)
> mpath0 at root
> scsibus0 at mpath0: 256 targets
> mainbus0 at root
> bios0 at mainbus0
> acpi0 at bios0: rev 2
> acpi0: sleep states S0 S1 S4 S5
> acpi0: tables DSDT FACP APIC MCFG OEMB HPET EINJ BERT ERST HEST
> acpi0: wakeup devices P0P1(S4) PS2K(S4) PS2M(S4) USB0(S4) USB1(S4) USB2(S4) 
> USB5(S4) EUSB(S4) USB3(S4) USB4(S4) USB6(S4) USBE(S4) P0P4(S4) P0P5(S4) 
> P0P6(S4) P0P7(S4) [...]
> acpitimer0 at acpi0: 3579545 Hz, 24 bits
> acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
> cpu0 at mainbus0: apid 0 (boot processor)
> cpu0: Intel(R) Atom(TM) CPU D525 @ 1.80GHz, 1800.23 MHz
> cpu0: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,DTES64,MWAIT,DS-CPL,TM2,SSSE3,CX16,xTPR,PDCM,MOVBE,NXE,LONG,LAHF,PERF,SENSOR
> cpu0: 512KB 64b/line 8-way L2 cache
> cpu0: smt 0, core 0, package 0
> mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges
> cpu0: apic clock running at 199MHz
> cpu0: mwait min=64, max=64, C-substates=0.1, IBE
> cpu1 at mainbus0: apid 2 (application processor)
> cpu1: Intel(R) Atom(TM) CPU D525 @ 1.80GHz, 1800.00 MHz
> cpu1: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,DTES64,MWAIT,DS-CPL,TM2,SSSE3,CX16,xTPR,PDCM,MOVBE,NXE,LONG,LAHF,PERF,SENSOR
> cpu1: 512KB 64b/line 8-way L2 cache
> cpu1: smt 0, core 1, package 0
> cpu2 at mainbus0: apid 1 (application processor)
> cpu2: Intel(R) Atom(TM) CPU D525 @ 1.80GHz, 1800.00 MHz
> cpu2: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,DTES64,MWAIT,DS-CPL,TM2,SSSE3,CX16,xTPR,PDCM,MOVBE,NXE,LONG,LAHF,PERF,SENSOR
> cpu2: 512KB 64b/line 8-way L2 cache
> cpu2: smt 1, core 0, package 0
> cpu3 at mainbus0: apid 3 (application processor)
> cpu3: 

Re: requesting help working around boot failures with supermicro atom board

2015-09-12 Thread John E.P. Hynes
Try booting the SP kernel and see if that works.  If it does, you might
be running into a variant of an issie I've had on my SuperMicro boxen...

-John

On 09/11/2015 06:38 PM, dewey.hyl...@gmail.com wrote:
> hi all. i’m having difficulty with this board:
> 
> Supermicro X7SPE-HD-D525 rev1
> 
> i have several similar systems, each running an older version of OpenBSD for 
> a few years without incident. except this one …
> 
> running OpenBSD 5.7 i386, from cold start it boots just fine and runs until 
> rebooted. once rebooted, however, prior to anything being displayed (i assume 
> this is early in the bios post phase) i get one very long beep. super micro 
> tells me this indicates inability to correctly initialize the memory. okay, 
> so i’ve changed memory for known working components and have the same issue. 
> at this point, the only thing that gets me booting again is to remove power 
> and then restore power. it then boots fine from cold start, and fails on the 
> next reboot (as in, “reboot” from the command line). once in long-beep 
> failure mode, neither the hardware reset button nor the power button can make 
> the machine boot again. the only thing that works is removing power. every 
> once in a while it will reboot successfully, only to fail in the same manner 
> on the next attempt.
> 
> super micro has had me flash bios, clear cmos, boot from different devices 
> and with nothing connected, etc. the results are the same: when rebooting 
> from openbsd, next boot fails until power is removed/restored. super micro 
> blames openbsd.
> 
> i installed linux (same hardware, overwrite openbsd 5.7) and scheduled a 
> reboot every 5 minutes and left it overnight. i logged 554 successful reboots.
> 
> i have since installed the latest available openbsd amd64 snapshot, and am 
> seeing the same failures.
> 
> i’m wondering if something could be disabled (boot -c ?) or if something else 
> raises a red flag and might have a workaround. this has me stumped. i would 
> very much appreciate a clue stick. 
> 
> dmesg follows:
> 
> OpenBSD 5.8-current (GENERIC.MP) #1364: Wed Sep  9 17:32:01 MDT 2015
> dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> real mem = 4277665792 (4079MB)
> avail mem = 4144070656 (3952MB)
> mpath0 at root
> scsibus0 at mpath0: 256 targets
> mainbus0 at root
> bios0 at mainbus0
> acpi0 at bios0: rev 2
> acpi0: sleep states S0 S1 S4 S5
> acpi0: tables DSDT FACP APIC MCFG OEMB HPET EINJ BERT ERST HEST
> acpi0: wakeup devices P0P1(S4) PS2K(S4) PS2M(S4) USB0(S4) USB1(S4) USB2(S4) 
> USB5(S4) EUSB(S4) USB3(S4) USB4(S4) USB6(S4) USBE(S4) P0P4(S4) P0P5(S4) 
> P0P6(S4) P0P7(S4) [...]
> acpitimer0 at acpi0: 3579545 Hz, 24 bits
> acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
> cpu0 at mainbus0: apid 0 (boot processor)
> cpu0: Intel(R) Atom(TM) CPU D525 @ 1.80GHz, 1800.23 MHz
> cpu0: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,DTES64,MWAIT,DS-CPL,TM2,SSSE3,CX16,xTPR,PDCM,MOVBE,NXE,LONG,LAHF,PERF,SENSOR
> cpu0: 512KB 64b/line 8-way L2 cache
> cpu0: smt 0, core 0, package 0
> mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges
> cpu0: apic clock running at 199MHz
> cpu0: mwait min=64, max=64, C-substates=0.1, IBE
> cpu1 at mainbus0: apid 2 (application processor)
> cpu1: Intel(R) Atom(TM) CPU D525 @ 1.80GHz, 1800.00 MHz
> cpu1: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,DTES64,MWAIT,DS-CPL,TM2,SSSE3,CX16,xTPR,PDCM,MOVBE,NXE,LONG,LAHF,PERF,SENSOR
> cpu1: 512KB 64b/line 8-way L2 cache
> cpu1: smt 0, core 1, package 0
> cpu2 at mainbus0: apid 1 (application processor)
> cpu2: Intel(R) Atom(TM) CPU D525 @ 1.80GHz, 1800.00 MHz
> cpu2: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,DTES64,MWAIT,DS-CPL,TM2,SSSE3,CX16,xTPR,PDCM,MOVBE,NXE,LONG,LAHF,PERF,SENSOR
> cpu2: 512KB 64b/line 8-way L2 cache
> cpu2: smt 1, core 0, package 0
> cpu3 at mainbus0: apid 3 (application processor)
> cpu3: Intel(R) Atom(TM) CPU D525 @ 1.80GHz, 1800.00 MHz
> cpu3: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,DTES64,MWAIT,DS-CPL,TM2,SSSE3,CX16,xTPR,PDCM,MOVBE,NXE,LONG,LAHF,PERF,SENSOR
> cpu3: 512KB 64b/line 8-way L2 cache
> cpu3: smt 1, core 1, package 0
> ioapic0 at mainbus0: apid 4 pa 0xfec0, version 20, 24 pins
> ioapic0: misconfigured as apic 1, remapped to apid 4
> acpimcfg0 at acpi0 addr 0xe000, bus 0-255
> acpihpet0 at acpi0: 14318179 Hz
> acpiprt0 at acpi0: bus 0 (PCI0)
> acpiprt1 at acpi0: bus 4 (P0P1)
> acpiprt2 at acpi0: bus 1 (P0P4)
> acpiprt3 at acpi0: bus 2 (P0P8)
> acpiprt4 at acpi0: bus 3 (P0P9)
> acpicpu0 at acpi0: C1(@1 halt!)
> acpicpu1 at acpi0: C1(@1 halt!)
> acpicpu2 at acpi0: C1(@1 halt!)
> acpicpu3 at acpi0: C1(@1 halt!)
> 

requesting help working around boot failures with supermicro atom board

2015-09-11 Thread Dewey Hylton
hi all. i’m having difficulty with OpenBSD on this board:

Supermicro X7SPE-HD-D525 rev1

i have several similar systems, each running an older version of OpenBSD
for a few years without incident. except this one …

running OpenBSD 5.7 i386 as well as latest amd64 snapshot, from cold start
it boots just fine and runs until rebooted. once rebooted, however, prior
to anything being displayed (i assume this is early in the bios post phase)
i get one very long beep. super micro tells me this indicates inability to
correctly initialize the memory. okay, so i’ve changed memory for known
working components and have the same issue. at this point, the only thing
that gets me booting again is to remove power and then restore power. it
then boots fine from cold start, and fails on the next reboot (as in,
“reboot” from the command line). once in long-beep failure mode, neither
the hardware reset button nor the power button can make the machine boot
again. the only thing that works is removing power. every once in a while
it will reboot successfully, only to fail in the same manner on the next
attempt.

super micro has had me flash bios, clear cmos, boot from different devices
and with nothing connected, etc. the results are the same: when rebooting
from openbsd, next boot fails until power is removed/restored. super micro
blames openbsd.

i installed linux (same hardware, overwrite openbsd 5.7) and scheduled a
reboot every 5 minutes and left it overnight. i logged 554 successful
reboots.

i’m wondering if something could be disabled (boot -c ?) or if something
else raises a red flag and might have a workaround. this has me stumped. i
would very much appreciate a clue stick.

dmesg follows:

OpenBSD 5.8-current (GENERIC.MP) #1364: Wed Sep  9 17:32:01 MDT 2015
dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
real mem = 4277665792  (4079MB)
avail mem = 4144070656  (3952MB)
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0
acpi0 at bios0: rev 2
acpi0: sleep states S0 S1 S4 S5
acpi0: tables DSDT FACP APIC MCFG OEMB HPET EINJ BERT ERST HEST
acpi0: wakeup devices P0P1(S4) PS2K(S4) PS2M(S4) USB0(S4) USB1(S4) USB2(S4)
USB5(S4) EUSB(S4) USB3(S4) USB4(S4) USB6(S4) USBE(S4) P0P4(S4) P0P5(S4)
P0P6(S4) P0P7(S4) [...]
acpitimer0 at acpi0: 3579545 Hz, 24 bits
acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: Intel(R) Atom(TM) CPU D525 @ 1.80GHz, 1800.23 MHz
cpu0:
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUS
H,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,DTES64,MWAIT,DS-CPL,TM2,SSSE3,
CX16,xTPR,PDCM,MOVBE,NXE,LONG,LAHF,PERF,SENSOR
cpu0: 512KB 64b/line 8-way L2 cache
cpu0: smt 0, core 0, package 0
mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges
cpu0: apic clock running at 199MHz
cpu0: mwait min=64, max=64, C-substates=0.1, IBE
cpu1 at mainbus0: apid 2 (application processor)
cpu1: Intel(R) Atom(TM) CPU D525 @ 1.80GHz, 1800.00 MHz
cpu1:
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUS
H,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,DTES64,MWAIT,DS-CPL,TM2,SSSE3,
CX16,xTPR,PDCM,MOVBE,NXE,LONG,LAHF,PERF,SENSOR
cpu1: 512KB 64b/line 8-way L2 cache
cpu1: smt 0, core 1, package 0
cpu2 at mainbus0: apid 1 (application processor)
cpu2: Intel(R) Atom(TM) CPU D525 @ 1.80GHz, 1800.00 MHz
cpu2:
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUS
H,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,DTES64,MWAIT,DS-CPL,TM2,SSSE3,
CX16,xTPR,PDCM,MOVBE,NXE,LONG,LAHF,PERF,SENSOR
cpu2: 512KB 64b/line 8-way L2 cache
cpu2: smt 1, core 0, package 0
cpu3 at mainbus0: apid 3 (application processor)
cpu3: Intel(R) Atom(TM) CPU D525 @ 1.80GHz, 1800.00 MHz
cpu3:
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUS
H,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,DTES64,MWAIT,DS-CPL,TM2,SSSE3,
CX16,xTPR,PDCM,MOVBE,NXE,LONG,LAHF,PERF,SENSOR
cpu3: 512KB 64b/line 8-way L2 cache
cpu3: smt 1, core 1, package 0
ioapic0 at mainbus0: apid 4 pa 0xfec0, version 20, 24 pins
ioapic0: misconfigured as apic 1, remapped to apid 4
acpimcfg0 at acpi0 addr 0xe000, bus 0-255
acpihpet0 at acpi0: 14318179 Hz
acpiprt0 at acpi0: bus 0 (PCI0)
acpiprt1 at acpi0: bus 4 (P0P1)
acpiprt2 at acpi0: bus 1 (P0P4)
acpiprt3 at acpi0: bus 2 (P0P8)
acpiprt4 at acpi0: bus 3 (P0P9)
acpicpu0 at acpi0: C1(@1 halt!)
acpicpu1 at acpi0: C1(@1 halt!)
acpicpu2 at acpi0: C1(@1 halt!)
acpicpu3 at acpi0: C1(@1 halt!)
acpibtn0 at acpi0: SLPB
acpibtn1 at acpi0: PWRB
pci0 at mainbus0 bus 0
pchb0 at pci0 dev 0 function 0 "Intel Pineview DMI" rev 0x02
uhci0 at pci0 dev 26 function 0 "Intel 82801I USB" rev 0x02: apic 4 int 16
uhci1 at pci0 dev 26 function 1 "Intel 82801I USB" rev 0x02: apic 4 int 21
uhci2 at pci0 dev 26 function 2 "Intel 82801I USB" rev 0x02: apic 4 int 19
ehci0 at pci0 dev 26 function 7 "Intel 82801I USB" rev 0x02: apic 4 int 18
usb0 at ehci0: USB revision 2.0

requesting help working around boot failures with supermicro atom board

2015-09-11 Thread dewey.hyl...@gmail.com
hi all. i’m having difficulty with this board:

Supermicro X7SPE-HD-D525 rev1

i have several similar systems, each running an older version of OpenBSD for a 
few years without incident. except this one …

running OpenBSD 5.7 i386, from cold start it boots just fine and runs until 
rebooted. once rebooted, however, prior to anything being displayed (i assume 
this is early in the bios post phase) i get one very long beep. super micro 
tells me this indicates inability to correctly initialize the memory. okay, so 
i’ve changed memory for known working components and have the same issue. at 
this point, the only thing that gets me booting again is to remove power and 
then restore power. it then boots fine from cold start, and fails on the next 
reboot (as in, “reboot” from the command line). once in long-beep failure mode, 
neither the hardware reset button nor the power button can make the machine 
boot again. the only thing that works is removing power. every once in a while 
it will reboot successfully, only to fail in the same manner on the next 
attempt.

super micro has had me flash bios, clear cmos, boot from different devices and 
with nothing connected, etc. the results are the same: when rebooting from 
openbsd, next boot fails until power is removed/restored. super micro blames 
openbsd.

i installed linux (same hardware, overwrite openbsd 5.7) and scheduled a reboot 
every 5 minutes and left it overnight. i logged 554 successful reboots.

i have since installed the latest available openbsd amd64 snapshot, and am 
seeing the same failures.

i’m wondering if something could be disabled (boot -c ?) or if something else 
raises a red flag and might have a workaround. this has me stumped. i would 
very much appreciate a clue stick. 

dmesg follows:

OpenBSD 5.8-current (GENERIC.MP) #1364: Wed Sep  9 17:32:01 MDT 2015
dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
real mem = 4277665792 (4079MB)
avail mem = 4144070656 (3952MB)
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0
acpi0 at bios0: rev 2
acpi0: sleep states S0 S1 S4 S5
acpi0: tables DSDT FACP APIC MCFG OEMB HPET EINJ BERT ERST HEST
acpi0: wakeup devices P0P1(S4) PS2K(S4) PS2M(S4) USB0(S4) USB1(S4) USB2(S4) 
USB5(S4) EUSB(S4) USB3(S4) USB4(S4) USB6(S4) USBE(S4) P0P4(S4) P0P5(S4) 
P0P6(S4) P0P7(S4) [...]
acpitimer0 at acpi0: 3579545 Hz, 24 bits
acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: Intel(R) Atom(TM) CPU D525 @ 1.80GHz, 1800.23 MHz
cpu0: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,DTES64,MWAIT,DS-CPL,TM2,SSSE3,CX16,xTPR,PDCM,MOVBE,NXE,LONG,LAHF,PERF,SENSOR
cpu0: 512KB 64b/line 8-way L2 cache
cpu0: smt 0, core 0, package 0
mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges
cpu0: apic clock running at 199MHz
cpu0: mwait min=64, max=64, C-substates=0.1, IBE
cpu1 at mainbus0: apid 2 (application processor)
cpu1: Intel(R) Atom(TM) CPU D525 @ 1.80GHz, 1800.00 MHz
cpu1: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,DTES64,MWAIT,DS-CPL,TM2,SSSE3,CX16,xTPR,PDCM,MOVBE,NXE,LONG,LAHF,PERF,SENSOR
cpu1: 512KB 64b/line 8-way L2 cache
cpu1: smt 0, core 1, package 0
cpu2 at mainbus0: apid 1 (application processor)
cpu2: Intel(R) Atom(TM) CPU D525 @ 1.80GHz, 1800.00 MHz
cpu2: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,DTES64,MWAIT,DS-CPL,TM2,SSSE3,CX16,xTPR,PDCM,MOVBE,NXE,LONG,LAHF,PERF,SENSOR
cpu2: 512KB 64b/line 8-way L2 cache
cpu2: smt 1, core 0, package 0
cpu3 at mainbus0: apid 3 (application processor)
cpu3: Intel(R) Atom(TM) CPU D525 @ 1.80GHz, 1800.00 MHz
cpu3: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,DTES64,MWAIT,DS-CPL,TM2,SSSE3,CX16,xTPR,PDCM,MOVBE,NXE,LONG,LAHF,PERF,SENSOR
cpu3: 512KB 64b/line 8-way L2 cache
cpu3: smt 1, core 1, package 0
ioapic0 at mainbus0: apid 4 pa 0xfec0, version 20, 24 pins
ioapic0: misconfigured as apic 1, remapped to apid 4
acpimcfg0 at acpi0 addr 0xe000, bus 0-255
acpihpet0 at acpi0: 14318179 Hz
acpiprt0 at acpi0: bus 0 (PCI0)
acpiprt1 at acpi0: bus 4 (P0P1)
acpiprt2 at acpi0: bus 1 (P0P4)
acpiprt3 at acpi0: bus 2 (P0P8)
acpiprt4 at acpi0: bus 3 (P0P9)
acpicpu0 at acpi0: C1(@1 halt!)
acpicpu1 at acpi0: C1(@1 halt!)
acpicpu2 at acpi0: C1(@1 halt!)
acpicpu3 at acpi0: C1(@1 halt!)
acpibtn0 at acpi0: SLPB
acpibtn1 at acpi0: PWRB
pci0 at mainbus0 bus 0
pchb0 at pci0 dev 0 function 0 "Intel Pineview DMI" rev 0x02
uhci0 at pci0 dev 26 function 0 "Intel 82801I USB" rev 0x02: apic 4 int 16
uhci1 at pci0 dev 26 function 1 "Intel 82801I USB" rev 0x02: apic 4 int 21
uhci2 at pci0 dev 26 function 2 "Intel 82801I USB" rev 0x02: apic 4 int 19
ehci0 at pci0 dev 26 function 7 "Intel 82801I