Re: ACPI? problem with release 8.0 | Perhaps solved?

2010-04-16 Thread Malcolm Kay
My RELEASE-8.0 has now been up for about 2hr, not
long enough to be sure the difficulty is circumvented,
but long enough to look promising. Previously RELEASE-8.0
has not stayed up more than about 4min. 

I tried setting machdep.idle to acpi and then to hlt without
success. But I now have set machdep.idle=spin.

Discovered there can be some problem in trying to set this
too early -- in particular in loader.conf -- presumably because
acpi.ko is not yet loaded. I ended up making sure everything was
ready by putting:
   #!/bin/sh
   echo setting machdep.idle=spin
   /sbin/sysctl machdep.idle=spin
in /etc/rc.local

To check what is happening I've created /usr/local/bin/sysctldump.sh as:
   #!/bin/sh
   [ -f /tmp/sysctl.dump.4 ]  mv -f /tmp/sysctl.dump.4 /tmp/sysctl.dump.5 
   [ -f /tmp/sysctl.dump.3 ]  mv -f /tmp/sysctl.dump.3 /tmp/sysctl.dump.4 
   [ -f /tmp/sysctl.dump.2 ]  mv -f /tmp/sysctl.dump.2 /tmp/sysctl.dump.3
   [ -f /tmp/sysctl.dump.1 ]  mv -f /tmp/sysctl.dump.1 /tmp/sysctl.dump.2 
   [ -f /tmp/sysctl.dump ]  mv -f /tmp/sysctl.dump /tmp/sysctl.dump.1
   sysctl -ao  /tmp/sysctl.dump
and adding:
   #sysctl dump
   1-59/2  *   *   *   *   root   /usr/local/bin/sysctldump.sh
to /etc/crontab.

I feel somewhat concerned that this cronjob may be sufficiently frequent to
prevent the system looking for the idle state and thus circumventing the 
problem in same other way. So I'm not yet convinced that I have a real solution.

I'll try removing the cronjob.

Thanks again for your attention,
Regards,

Malcolm Kay




On Tue, 13 Apr 2010 02:38 pm, Malcolm Kay wrote:
 On Tue, 13 Apr 2010 04:03 am, Ian Smith wrote:
  In freebsd-questions Digest, Vol 306, Issue 1, Message: 18
 
  On Mon, 12 Apr 2010 15:31:33 +0930 Malcolm Kay

 malcolm@internode.on.net wrote:
I desperately need to make some progress on this issue.
 
  Then I suggest taking it to freebsd-acpi@ without passing go
  .. maybe with a bit more data to hand, as outlined in the
  ACPI debugging section of the handbook.

 Yes, I have now realised this; but now somewhat reticent to
 move there now and be criticised for cross-posting

Is it likely that the issue is real rather than hardware
or disk corruption? Earlier releases are operating OK on
the same machine.
 
  Sounds like a real issue, but I don't know the hardware. 
  Does it have the latest available BIOS update?  If not,
  that's step one.  Will it stay up long enough to get a
  verbose dmesg off it?  Do you have a verbose dmesg from an
  earlier working release for comparison?

 Probably not; I have considered it.
 But the manufacturer's site warns not to upgrade unless you
 have identifyable problems (or something similar).
 And since earlier release work well I'm not anxious to open a
 new can of worms. If I become sufficiently desparate I'll try
 it.

I have now confirmed that:
 debug.acpi.disabled=acad button cpu lid thermal timer
video still leaves the system crashing and powering down
when idle for a while. And the more extensive:
 debug.acpi.disabled=acad bus children button cmbat cpu
ec isa lid pci pci_link sysresource thermal timer video
does the same.
   
I don't really need power management but with acpi
disabled the disks are not visible to the system.
 
  ACPI needs to work on modern hardware, no question.
 
Are there sysctl variables that can influence this
behaviour? Currently I believe we have:
   
hw.acpi.supported_sleep_state: S1 S4 S5
hw.acpi.power_button_state: S5
hw.acpi.sleep_button_state: S1
hw.acpi.lid_switch_state: NONE
hw.acpi.standby_state: S1
hw.acpi.suspend_state: NONE
hw.acpi.sleep_delay: 1
hw.acpi.s4bios: 0
hw.acpi.verbose: 0
 
  May help to set hw.acpi.verbose=1 in /boot/loader.conf while
  debugging; especially useful after verbose boot for detail
  in dmesg and messages.

 Looks as though it might be useful, but I'm starting to
 believe acpi itself may not be the problem

hw.acpi.disable_on_reboot: 0
hw.acpi.handle_reboot: 0
hw.acpi.reset_video: 0
hw.acpi.cpu.cx_lowest: C1
 
  Is that with acpi.thermal disabled?

 No, this is run with acpi as default configured.
 Boot | login as root | sysctl -a  sysctl.dump | shutdown -p
 now (Get out before crash so that I don't get into trouble
 with fsck on reboot, yes it runs in the background but takes
 forever.)

 Rebooting in FreeBSD 7.0 I can now mount the 8.0 partitions
 and look at the dump in my own time -- and also prepare these
 emails. (Fsck also runs under 7.0 on the 8.0 partitions if 8.0
 was allowed to crash.)

  If so, showing hw.acpi
  and debug.acpi with everything enabled might provide more
  clues.

 OK

machdep.idle: amdc1e
machdep.idle_available: spin, amdc1e, hlt, acpi,
   
However on the earlier RELEASEs that work I note we do
not have machdep.idle or machdep.idle_available. Instead
I find: machdep.cpu_idle_hlt: 1
machdep.hlt_cpus: 0
   
Although I've not been 

Re: ACPI? problem with release 8.0 | Perhaps solved?

2010-04-16 Thread Ian Smith
On Fri, 16 Apr 2010 17:13:48 +0930, Malcolm Kay wrote:

  My RELEASE-8.0 has now been up for about 2hr, not
  long enough to be sure the difficulty is circumvented,
  but long enough to look promising. Previously RELEASE-8.0
  has not stayed up more than about 4min. 

Sounds promising ..

  I tried setting machdep.idle to acpi and then to hlt without
  success. But I now have set machdep.idle=spin.

Wow, ok.  I only have a vague idea of how these work, but having to 
change this definitely indicates a bug somewhere; whether your BIOS 
settings or ACPI implementation or kernel or what else, I've no idea.

  Discovered there can be some problem in trying to set this
  too early -- in particular in loader.conf -- presumably because
  acpi.ko is not yet loaded. I ended up making sure everything was
  ready by putting:

Don't presume too easily .. acpi.ko gets loaded really early, it's 
needed fired up even before scanning busses and initialising most 
devices.  A verbose dmesg.boot should give some indication to anyone 
familiar with what should be.  An acpidump may be useful too.

Can you put files up anywhere to fetch?  If not, you can mail me them, 
they're each too big to attach to -questions.  The usual deal on acpi@ 
is to put up URL(s) to such files; I'd be happy to host them here.

But you really should take this afresh to acpi@ .. they don't bite, the 
worst that can happen is they'll ignore you :) and with a new message 
with the concise story to date, I'd expect someone to take an interest; 
maybe just to say 'turn this off|on' or or 'that was fixed in -stable 
last month' or 'try this patch' or 'show us your [whatever]' ..

 #!/bin/sh
 echo setting machdep.idle=spin
 /sbin/sysctl machdep.idle=spin
  in /etc/rc.local

Ok.  dmesg.boot then will show what happens before that gets switched.

If you enable console.log in syslog.conf that change will show up there
after boot messages, maybe other useful stuff, but at least show dmesg.

  To check what is happening I've created /usr/local/bin/sysctldump.sh as:
 #!/bin/sh
 [ -f /tmp/sysctl.dump.4 ]  mv -f /tmp/sysctl.dump.4 /tmp/sysctl.dump.5 
 [ -f /tmp/sysctl.dump.3 ]  mv -f /tmp/sysctl.dump.3 /tmp/sysctl.dump.4 
 [ -f /tmp/sysctl.dump.2 ]  mv -f /tmp/sysctl.dump.2 /tmp/sysctl.dump.3
 [ -f /tmp/sysctl.dump.1 ]  mv -f /tmp/sysctl.dump.1 /tmp/sysctl.dump.2 
 [ -f /tmp/sysctl.dump ]  mv -f /tmp/sysctl.dump /tmp/sysctl.dump.1
 sysctl -ao  /tmp/sysctl.dump
  and adding:
 #sysctl dump
 1-59/2  *   *   *   *   root   /usr/local/bin/sysctldump.sh
  to /etc/crontab.

sysctl -ao is likely Way Too Much Information, though I suppose diffs 
between them might show something useful changing over time.  'sysctl hw 
dev acpi' is probably plenty to chew on.

  I feel somewhat concerned that this cronjob may be sufficiently frequent to
  prevent the system looking for the idle state and thus circumventing the 
  problem in same other way. So I'm not yet convinced that I have a real 
  solution.

We're not talking about idle in the sense top shows you - this is about 
the kernel having nothing to do for perhaps hundreds of microseconds so 
entering a microsleep state.  The old 386s just had the HLT instruction 
which had the CPU wait for an interrupt (to save power).  These days 
there are multiple C-states with varying levels of power reduction with 
different latencies, ie times to wake up, usually managed by ACPI.

I suspect 'spin' just loops awaiting an interrupt, staying busy?

C1E is one such newer state.  I know nothing about it, but that's what 
your system thought it should use the amdc1e cpufreq? driver for, so 
your problem definitely seems related to that.  This clearly is within 
the ambit of the acpi@ list, and most of those folks seem rarely to have 
the sort of spare time needed to follow -questions.

Also at least check the change log between your BIOS and the latest; if 
there's anything related to C states or similar, you should try it; they 
always say not to do it unless you need to - you might need to, and that
might be all you need to do.

  I'll try removing the cronjob.
  
  Thanks again for your attention,
  Regards,
  
  Malcolm Kay

Thanks for cc'ing me, I read -digests which can take half a day and make 
replying a bit tedious, not to mention breaking list threading.

cheers, Ian

[..]
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org