Hi, I'm using OpenSolaris on a machine in my basement -- it's exporting files over NFS for my home network. This machine has been running for about a year and a half now, but only recently has it been stressed. In addition, I recently added a couple 1TB drives.
The initial symptoms started a few days ago: I'd find the machine powered off at random times. At first, I thought it was my 2-year-old son hitting the power button or something, but it's now clear that wasn't the cause. Now sometimes system won't boot with the latest kernel. If I select an older kernel (2008-11?) it will boot, but after a few minutes, it spontaneously reboots. It also sometimes spontaneously reboots if I get it booted with the latest kernel. I suspect it's a hardware problem, but my OpenSolaris knowledge is poor. If this was a Linux machine, I'd know exactly what to try, but I haven't a clue here. I'm tempted to try installing a newer OSOL (snv_125 or so), but don't know whether that would just make things worse. My questions: 1. What steps should I take to repair an OpenSolaris system that cannot boot, or boots with errors? I was unable to find a good source of information on this. I downloaded the Milax live CD, but I'm not sure how I'd fix anything with that. 2. If I re-install 2009.06, or install the latest 2010.02 dev ISO, will my ZFS pools be safe? 3. Since this host will be a critical part of my home network, I need to know what to do when things go wrong. Can you suggest online resources I should read? When I boot the system with the latest kernel, it successfully mounts my ZFS filesystems, but then displays: svc.startd[7]: application/pkg/update:default transitioned to maintenance by request (see 'svcs -xv' for details) My svcs -xv and svcs -l information is below: root at weyl:~# svcs -xv svc:/network/dns/multicast:default (DNS Service Discovery and Multicast DNS) State: disabled since Fri Oct 30 06:24:27 2009 Reason: Disabled by an administrator. See: http://sun.com/msg/SMF-8000-05 See: man -M /usr/share/man -s 1M mdnsd See: http://opensolaris.org/os/project/nwam/service-discovery/ Impact: 1 dependent service is not running: svc:/system/avahi-bridge-dsd:default svc:/application/pkg/update:default (image packaging repository) State: maintenance since Fri Oct 30 06:34:16 2009 Reason: Maintenance requested by "svc:/application/pkg/update:default" See: /var/svc/log/application-pkg-update:default.log See: http://sun.com/msg/SMF-8000-R4 See: /var/svc/log/application-pkg-update:default.log Impact: This service is not running. root at weyl:~# root at weyl:~# svcs -l network/dns/multicast application/pkg/update fmri svc:/network/dns/multicast:default name DNS Service Discovery and Multicast DNS enabled false state disabled next_state none state_time Fri Oct 30 06:24:27 2009 restarter svc:/system/svc/restarter:default dependency require_all/none svc:/network/loopback (online) dependency require_all/none svc:/network/physical (multiple) dependency optional_all/refresh svc:/system/identity:node (online) dependency optional_all/none svc:/system/system-log (online) fmri svc:/application/pkg/update:default name image packaging repository enabled true state maintenance next_state none state_time Fri Oct 30 06:34:16 2009 logfile /var/svc/log/application-pkg-update:default.log restarter svc:/system/svc/restarter:default dependency require_all/none svc:/system/filesystem/local (online) dependency optional_all/none svc:/system/filesystem/autofs (online) root at weyl:~# Tail of application-pkg-update:default.log: [ Oct 30 06:30:18 Method "start" exited with status 0. ] [ Oct 30 06:30:18 Stopping for maintenance due to service_request. ] [ Oct 30 06:30:18 Executing stop method ("lib/svc/method/pkg-update stop"). ] crontab: you are not authorized to use cron. Sorry. crontab: you are not authorized to use cron. Sorry. [ Oct 30 06:30:18 Method "stop" exited with status 0. ] [ Oct 30 06:30:18 Stopping for maintenance due to service_request. ] [ Oct 30 06:30:18 Stopping for maintenance due to service_request. ] [ Oct 30 06:30:18 Stopping for maintenance due to service_request. ] [ Oct 30 06:34:16 Leaving maintenance because clear requested. ] [ Oct 30 06:34:16 Enabled. ] [ Oct 30 06:34:16 Executing start method ("lib/svc/method/pkg-update start"). ] crontab: you are not authorized to use cron. Sorry. crontab: you are not authorized to use cron. Sorry. [ Oct 30 06:34:16 Method "start" exited with status 0. ] [ Oct 30 06:34:16 Stopping for maintenance due to service_request. ] [ Oct 30 06:34:16 Executing stop method ("lib/svc/method/pkg-update stop"). ] crontab: you are not authorized to use cron. Sorry. crontab: you are not authorized to use cron. Sorry. [ Oct 30 06:34:16 Method "stop" exited with status 0. ] [ Oct 30 06:34:16 Stopping for maintenance due to service_request. ] [ Oct 30 06:34:16 Stopping for maintenance due to service_request. ] [ Oct 30 06:34:16 Stopping for maintenance due to service_request. ] I've got a bunch of core files in /: root at weyl:~# file /core* /core: ELF 32-bit LSB core file 80386 Version 1, from 'svc.startd' /core.svc.configd.1256783994.9: ELF 32-bit LSB core file 80386 Version 1, from 'svc.configd' /core.svc.configd.1256783995.11: ELF 32-bit LSB core file 80386 Version 1, from 'svc.configd' /core.svc.configd.1256785110.9: ELF 32-bit LSB core file 80386 Version 1, from 'svc.configd' /core.svc.configd.1256785114.12: ELF 32-bit LSB core file 80386 Version 1, from 'svc.configd' /core.svc.configd.1256786508.23: ELF 32-bit LSB core file 80386 Version 1, from 'svc.configd' /core.svc.configd.1256786511.308: ELF 32-bit LSB core file 80386 Version 1, from 'svc.configd' /core.svc.configd.1256786766.9: ELF 32-bit LSB core file 80386 Version 1, from 'svc.configd' /core.svc.configd.1256787700.9: ELF 32-bit LSB core file 80386 Version 1, from 'svc.configd' /core.svc.configd.1256787704.11: ELF 32-bit LSB core file 80386 Version 1, from 'svc.configd' /core.svc.configd.1256787707.23: ELF 32-bit LSB core file 80386 Version 1, from 'svc.configd' /core.svc.configd.1256826745.9: ELF 32-bit LSB core file 80386 Version 1, from 'svc.configd' /core.svc.configd.1256826750.19: ELF 32-bit LSB core file 80386 Version 1, from 'svc.configd' /core.svc.configd.1256826761.48: ELF 32-bit LSB core file 80386 Version 1, from 'svc.configd' /core.svc.configd.1256826777.124: ELF 32-bit LSB core file 80386 Version 1, from 'svc.configd' /core.svc.configd.1256826780.832: ELF 32-bit LSB core file 80386 Version 1, from 'svc.configd' /core.svc.configd.1256874425.9: ELF 32-bit LSB core file 80386 Version 1, from 'svc.configd' /core.svc.configd.1256874429.11: ELF 32-bit LSB core file 80386 Version 1, from 'svc.configd' /core.svc.configd.1256874430.17: ELF 32-bit LSB core file 80386 Version 1, from 'svc.configd' /core.svc.configd.1256874550.23: ELF 32-bit LSB core file 80386 Version 1, from 'svc.configd' /core.svc.configd.1256874551.156: ELF 32-bit LSB core file 80386 Version 1, from 'svc.configd' /core.svc.configd.1256874553.158: ELF 32-bit LSB core file 80386 Version 1, from 'svc.configd' /core.svc.configd.1256874556.163: ELF 32-bit LSB core file 80386 Version 1, from 'svc.configd' root at weyl:~# Hardware information: root at weyl:/var/svc/log# prtdiag System Configuration: System manufacturer System Product Name BIOS Configuration: Phoenix Technologies, LTD ASUS M2N-SLI DELUXE ACPI BIOS Revision 1502 03/31/2008 ==== Processor Sockets ==================================== Version Location Tag -------------------------------- -------------------------- AMD Athlon(tm) 64 X2 Dual Core Processor 6400+ Socket AM2 ==== Memory Device Sockets ================================ Type Status Set Device Locator Bank Locator ----------- ------ --- ------------------- ---------------- DDR in use 0 DIMM_B1 Bank0/1 unknown empty 0 DIMM_B2 Bank2/3 DDR in use 0 DIMM_A1 Bank4/5 unknown empty 0 DIMM_A2 Bank6/7 ==== On-Board Devices ===================================== ==== Upgradeable Slots ==================================== ID Status Type Description --- --------- ---------------- ---------------------------- 1 in use PCI PCI1 2 available PCI PCI2 3 available PCI PCI3 4 available PCI Express PCIEX16_1 5 available PCI Express PCIEX16_2 6 in use PCI Express PCIEX1_1 7 available PCI Express PCIEX1_2 root at weyl:/var/svc/log#