Hi, I recently decided to go with OpenSolaris for our backup storage system. However after a few days (or hours) the system seems to hang. I upgraded to SNV131 to use ZFS deduplication and RAIDZ2. In the mean time I updated the software to SNV132. The hangs have been occurring a few days after upgrade in both versions.
In attachment an open top session when the system hangs screenshot taken through the IPMI interface. The only process that is actively running is an rdiff-backup process - a Python-based backup system that makes differential backups. The scripts get kicked off by a cron job. Sometimes it runs through and does a backup, other times it just hangs. There are also SSH and SFTP sessions once in a while where people or scripts upload/download/delete various stuff. At the moment of the screenshot hang no SSH sessions were running but they have happened when SSH sessions were open as well. The backup destination is a ZFS pool with the following configuration: pool: rpool state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM rpool ONLINE 0 0 0 c7t0d0s0 ONLINE 0 0 0 errors: No known data errors pool: zpool1 state: ONLINE scrub: scrub in progress for 0h26m, 0.01% done, 7286h37m to go config: NAME STATE READ WRITE CKSUM zpool1 ONLINE 0 0 0 raidz2-0 ONLINE 0 0 0 c7t0d1 ONLINE 0 0 0 c7t0d2 ONLINE 0 0 0 c7t0d3 ONLINE 0 0 0 c7t0d4 ONLINE 0 0 0 c7t0d5 ONLINE 0 0 0 c7t0d6 ONLINE 0 0 0 c7t0d7 ONLINE 0 0 0 c7t1d0 ONLINE 0 0 0 c7t1d1 ONLINE 0 0 0 c7t1d2 ONLINE 0 0 0 c7t1d3 ONLINE 0 0 0 c7t1d4 ONLINE 0 0 0 spares c10t3d7 AVAIL errors: No known data errors zfs get all NAME PROPERTY VALUE SOURCE zpool1 type filesystem - zpool1 creation Wed Jan 27 10:48 2010 - zpool1 used 7.95T - zpool1 available 12.5T - zpool1 referenced 62.1K - zpool1 compressratio 1.00x - zpool1 mounted yes - zpool1 quota none default zpool1 reservation none default zpool1 recordsize 128K default zpool1 mountpoint /zpool1 default zpool1 sharenfs off default zpool1 checksum on default zpool1 compression off default zpool1 atime on default zpool1 devices on default zpool1 exec on default zpool1 setuid on default zpool1 readonly off default zpool1 zoned off default zpool1 snapdir hidden default zpool1 aclmode groupmask default zpool1 aclinherit restricted default zpool1 canmount on default zpool1 shareiscsi off default zpool1 xattr on default zpool1 copies 1 default zpool1 version 3 - zpool1 utf8only off - zpool1 normalization none - zpool1 casesensitivity sensitive - zpool1 vscan off default zpool1 nbmand off default zpool1 sharesmb off default zpool1 refquota none default zpool1 refreservation none default zpool1 primarycache all default zpool1 secondarycache all default zpool1 usedbysnapshots 0 - zpool1 usedbydataset 62.1K - zpool1 usedbychildren 7.95T - zpool1 usedbyrefreservation 0 - zpool1 logbias latency default zpool1 dedup on local zpool1 mlslabel none default Another issue is the scrubbing taking forever. I started a scrub last week and I believe there is a bug around that already with the de-duplication feature. The system halted before the scrubbing as well so I don't think that is the issue. I can't stop the scrubbing either - the command just hangs. The hardware is: System Configuration: Supermicro X8DT3 BIOS Configuration: American Megatrends Inc. 080015 09/24/2009 BMC Configuration: IPMI 1.5 (KCS: Keyboard Controller Style) ==== Processor Sockets ==================================== Version Location Tag -------------------------------- -------------------------- Intel(R) Xeon(R) CPU E5520 @ 2.27GHz CPU 2 Intel(R) Xeon(R) CPU E5520 @ 2.27GHz CPU 1 ==== Memory Device Sockets ================================ Type Status Set Device Locator Bank Locator ----------- ------ --- ------------------- ---------------- other in use 0 P1-DIMM1A BANK0 other empty 0 P1-DIMM1B BANK1 other in use 0 P1-DIMM2A BANK2 other empty 0 P1-DIMM2B BANK3 other in use 0 P1-DIMM3A BANK4 other empty 0 P1-DIMM3B BANK5 other in use 0 P2-DIMM1A BANK6 other empty 0 P2-DIMM1B BANK7 other in use 0 P2-DIMM2A BANK8 other empty 0 P2-DIMM2B BANK9 other in use 0 P2-DIMM3A BANK10 other empty 0 P2-DIMM3B BANK11 ==== On-Board Devices ===================================== ==== Upgradeable Slots ==================================== ID Status Type Description --- --------- ---------------- ---------------------------- 1 available PCI PCI#1 2 in use PCI Express PCI-E#2 3 available PCI PCI#3 4 in use PCI Express PCI#4 5 available PCI Express PCI-E#5 6 in use PCI Express PCI-E#6 All 12 2TB SATA disks in the zpool1 array are pass-through on a SAS backplane connected to a Areca 1640 controller, the spare sits on another Areca 1640 controller. The root pool is on the same controller with 2 Seagate 500GB disks in RAID1 (on the controller). I don't know what other information you would need to troubleshoot this but I don't think this is normal behavior, even for a developers release. -- This message posted from opensolaris.org -------------- next part -------------- A non-text attachment was scrubbed... Name: solaris-crash.jpg Type: image/jpeg Size: 154960 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/opensolaris-help/attachments/20100215/2cdb54d8/attachment-0001.jpg>