Hi,

I'm using OpenSolaris on a machine in my basement -- it's exporting
files over NFS for my home network. This machine has been running for
about a year and a half now, but only recently has it been stressed.
In addition, I recently added a couple 1TB drives.

The initial symptoms started a few days ago: I'd find the machine
powered off at random times. At first, I thought it was my 2-year-old
son hitting the power button or something, but it's now clear that
wasn't the cause.

Now sometimes system won't boot with the latest kernel. If I select an
older kernel (2008-11?) it will boot, but after a few minutes, it
spontaneously reboots. It also sometimes spontaneously reboots if I
get it booted with the latest kernel.

I suspect it's a hardware problem, but my OpenSolaris knowledge is
poor. If this was a Linux machine, I'd know exactly what to try, but I
haven't a clue here.

I'm tempted to try installing a newer OSOL (snv_125 or so), but don't
know whether that would just make things worse.

My questions:
1. What steps should I take to repair an OpenSolaris system that
cannot boot, or boots with errors? I was unable to find a good source
of information on this. I downloaded the Milax live CD, but I'm not
sure how I'd fix anything with that.
2. If I re-install 2009.06, or install the latest 2010.02 dev ISO,
will my ZFS pools be safe?
3. Since this host will be a critical part of my home network, I need
to know what to do when things go wrong. Can you suggest online
resources I should read?

When I boot the system with the latest kernel, it successfully mounts
my ZFS filesystems, but then displays:
svc.startd[7]: application/pkg/update:default transitioned to
maintenance by request (see 'svcs -xv' for details)

My svcs -xv and svcs -l information is below:
root at weyl:~# svcs -xv
svc:/network/dns/multicast:default (DNS Service Discovery and Multicast DNS)
 State: disabled since Fri Oct 30 06:24:27 2009
Reason: Disabled by an administrator.
   See: http://sun.com/msg/SMF-8000-05
   See: man -M /usr/share/man -s 1M mdnsd
   See: http://opensolaris.org/os/project/nwam/service-discovery/
Impact: 1 dependent service is not running:
        svc:/system/avahi-bridge-dsd:default

svc:/application/pkg/update:default (image packaging repository)
 State: maintenance since Fri Oct 30 06:34:16 2009
Reason: Maintenance requested by "svc:/application/pkg/update:default"
   See: /var/svc/log/application-pkg-update:default.log
   See: http://sun.com/msg/SMF-8000-R4
   See: /var/svc/log/application-pkg-update:default.log
Impact: This service is not running.
root at weyl:~#

root at weyl:~# svcs -l network/dns/multicast application/pkg/update
fmri         svc:/network/dns/multicast:default
name         DNS Service Discovery and Multicast DNS
enabled      false
state        disabled
next_state   none
state_time   Fri Oct 30 06:24:27 2009
restarter    svc:/system/svc/restarter:default
dependency   require_all/none svc:/network/loopback (online)
dependency   require_all/none svc:/network/physical (multiple)
dependency   optional_all/refresh svc:/system/identity:node (online)
dependency   optional_all/none svc:/system/system-log (online)

fmri         svc:/application/pkg/update:default
name         image packaging repository
enabled      true
state        maintenance
next_state   none
state_time   Fri Oct 30 06:34:16 2009
logfile      /var/svc/log/application-pkg-update:default.log
restarter    svc:/system/svc/restarter:default
dependency   require_all/none svc:/system/filesystem/local (online)
dependency   optional_all/none svc:/system/filesystem/autofs (online)
root at weyl:~#

Tail of application-pkg-update:default.log:
[ Oct 30 06:30:18 Method "start" exited with status 0. ]
[ Oct 30 06:30:18 Stopping for maintenance due to service_request. ]
[ Oct 30 06:30:18 Executing stop method ("lib/svc/method/pkg-update stop"). ]
crontab: you are not authorized to use cron.  Sorry.
crontab: you are not authorized to use cron.  Sorry.
[ Oct 30 06:30:18 Method "stop" exited with status 0. ]
[ Oct 30 06:30:18 Stopping for maintenance due to service_request. ]
[ Oct 30 06:30:18 Stopping for maintenance due to service_request. ]
[ Oct 30 06:30:18 Stopping for maintenance due to service_request. ]
[ Oct 30 06:34:16 Leaving maintenance because clear requested. ]
[ Oct 30 06:34:16 Enabled. ]
[ Oct 30 06:34:16 Executing start method ("lib/svc/method/pkg-update start"). ]
crontab: you are not authorized to use cron.  Sorry.
crontab: you are not authorized to use cron.  Sorry.
[ Oct 30 06:34:16 Method "start" exited with status 0. ]
[ Oct 30 06:34:16 Stopping for maintenance due to service_request. ]
[ Oct 30 06:34:16 Executing stop method ("lib/svc/method/pkg-update stop"). ]
crontab: you are not authorized to use cron.  Sorry.
crontab: you are not authorized to use cron.  Sorry.
[ Oct 30 06:34:16 Method "stop" exited with status 0. ]
[ Oct 30 06:34:16 Stopping for maintenance due to service_request. ]
[ Oct 30 06:34:16 Stopping for maintenance due to service_request. ]
[ Oct 30 06:34:16 Stopping for maintenance due to service_request. ]


I've got a bunch of core files in /:
root at weyl:~# file /core*
/core:          ELF 32-bit LSB core file 80386 Version 1, from 'svc.startd'
/core.svc.configd.1256783994.9: ELF 32-bit LSB core file 80386 Version
1, from 'svc.configd'
/core.svc.configd.1256783995.11:        ELF 32-bit LSB core file 80386
Version 1, from 'svc.configd'
/core.svc.configd.1256785110.9: ELF 32-bit LSB core file 80386 Version
1, from 'svc.configd'
/core.svc.configd.1256785114.12:        ELF 32-bit LSB core file 80386
Version 1, from 'svc.configd'
/core.svc.configd.1256786508.23:        ELF 32-bit LSB core file 80386
Version 1, from 'svc.configd'
/core.svc.configd.1256786511.308:       ELF 32-bit LSB core file 80386
Version 1, from 'svc.configd'
/core.svc.configd.1256786766.9: ELF 32-bit LSB core file 80386 Version
1, from 'svc.configd'
/core.svc.configd.1256787700.9: ELF 32-bit LSB core file 80386 Version
1, from 'svc.configd'
/core.svc.configd.1256787704.11:        ELF 32-bit LSB core file 80386
Version 1, from 'svc.configd'
/core.svc.configd.1256787707.23:        ELF 32-bit LSB core file 80386
Version 1, from 'svc.configd'
/core.svc.configd.1256826745.9: ELF 32-bit LSB core file 80386 Version
1, from 'svc.configd'
/core.svc.configd.1256826750.19:        ELF 32-bit LSB core file 80386
Version 1, from 'svc.configd'
/core.svc.configd.1256826761.48:        ELF 32-bit LSB core file 80386
Version 1, from 'svc.configd'
/core.svc.configd.1256826777.124:       ELF 32-bit LSB core file 80386
Version 1, from 'svc.configd'
/core.svc.configd.1256826780.832:       ELF 32-bit LSB core file 80386
Version 1, from 'svc.configd'
/core.svc.configd.1256874425.9: ELF 32-bit LSB core file 80386 Version
1, from 'svc.configd'
/core.svc.configd.1256874429.11:        ELF 32-bit LSB core file 80386
Version 1, from 'svc.configd'
/core.svc.configd.1256874430.17:        ELF 32-bit LSB core file 80386
Version 1, from 'svc.configd'
/core.svc.configd.1256874550.23:        ELF 32-bit LSB core file 80386
Version 1, from 'svc.configd'
/core.svc.configd.1256874551.156:       ELF 32-bit LSB core file 80386
Version 1, from 'svc.configd'
/core.svc.configd.1256874553.158:       ELF 32-bit LSB core file 80386
Version 1, from 'svc.configd'
/core.svc.configd.1256874556.163:       ELF 32-bit LSB core file 80386
Version 1, from 'svc.configd'
root at weyl:~#

Hardware information:
root at weyl:/var/svc/log# prtdiag
System Configuration: System manufacturer System Product Name
BIOS Configuration: Phoenix Technologies, LTD ASUS M2N-SLI DELUXE ACPI
BIOS Revision 1502 03/31/2008

==== Processor Sockets ====================================

Version                          Location Tag
-------------------------------- --------------------------
AMD Athlon(tm) 64 X2 Dual Core Processor 6400+ Socket AM2

==== Memory Device Sockets ================================

Type        Status Set Device Locator      Bank Locator
----------- ------ --- ------------------- ----------------
DDR         in use 0   DIMM_B1             Bank0/1
unknown     empty  0   DIMM_B2             Bank2/3
DDR         in use 0   DIMM_A1             Bank4/5
unknown     empty  0   DIMM_A2             Bank6/7

==== On-Board Devices =====================================

==== Upgradeable Slots ====================================

ID  Status    Type             Description
--- --------- ---------------- ----------------------------
1   in use    PCI              PCI1
2   available PCI              PCI2
3   available PCI              PCI3
4   available PCI Express      PCIEX16_1
5   available PCI Express      PCIEX16_2
6   in use    PCI Express      PCIEX1_1
7   available PCI Express      PCIEX1_2
root at weyl:/var/svc/log#

Reply via email to