Hi,

I recently decided to go with OpenSolaris for our backup storage system. 
However after a few days (or hours) the system seems to hang. I upgraded to 
SNV131 to use ZFS deduplication and RAIDZ2. In the mean time I updated the 
software to SNV132. The hangs have been occurring a few days after upgrade in 
both versions.

In attachment an open top session when the system hangs screenshot taken 
through the IPMI interface. The only process that is actively running is an 
rdiff-backup process - a Python-based backup system that makes differential 
backups. The scripts get kicked off by a cron job. Sometimes it runs through 
and does a backup, other times it just hangs.

There are also SSH and SFTP sessions once in a while where people or scripts 
upload/download/delete various stuff. At the moment of the screenshot hang no 
SSH sessions were running but they have happened when SSH sessions were open as 
well.

The backup destination is a ZFS pool with the following configuration:

  pool: rpool
 state: ONLINE
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        rpool       ONLINE       0     0     0
          c7t0d0s0  ONLINE       0     0     0

errors: No known data errors

  pool: zpool1
 state: ONLINE
 scrub: scrub in progress for 0h26m, 0.01% done, 7286h37m to go
config:

        NAME        STATE     READ WRITE CKSUM
        zpool1      ONLINE       0     0     0
          raidz2-0  ONLINE       0     0     0
            c7t0d1  ONLINE       0     0     0
            c7t0d2  ONLINE       0     0     0
            c7t0d3  ONLINE       0     0     0
            c7t0d4  ONLINE       0     0     0
            c7t0d5  ONLINE       0     0     0
            c7t0d6  ONLINE       0     0     0
            c7t0d7  ONLINE       0     0     0
            c7t1d0  ONLINE       0     0     0
            c7t1d1  ONLINE       0     0     0
            c7t1d2  ONLINE       0     0     0
            c7t1d3  ONLINE       0     0     0
            c7t1d4  ONLINE       0     0     0
        spares
          c10t3d7   AVAIL   

errors: No known data errors

zfs get all

NAME    PROPERTY              VALUE                  SOURCE
zpool1  type                  filesystem             -
zpool1  creation              Wed Jan 27 10:48 2010  -
zpool1  used                  7.95T                  -
zpool1  available             12.5T                  -
zpool1  referenced            62.1K                  -
zpool1  compressratio         1.00x                  -
zpool1  mounted               yes                    -
zpool1  quota                 none                   default
zpool1  reservation           none                   default
zpool1  recordsize            128K                   default
zpool1  mountpoint            /zpool1                default
zpool1  sharenfs              off                    default
zpool1  checksum              on                     default
zpool1  compression           off                    default
zpool1  atime                 on                     default
zpool1  devices               on                     default
zpool1  exec                  on                     default
zpool1  setuid                on                     default
zpool1  readonly              off                    default
zpool1  zoned                 off                    default
zpool1  snapdir               hidden                 default
zpool1  aclmode               groupmask              default
zpool1  aclinherit            restricted             default
zpool1  canmount              on                     default
zpool1  shareiscsi            off                    default
zpool1  xattr                 on                     default
zpool1  copies                1                      default
zpool1  version               3                      -
zpool1  utf8only              off                    -
zpool1  normalization         none                   -
zpool1  casesensitivity       sensitive              -
zpool1  vscan                 off                    default
zpool1  nbmand                off                    default
zpool1  sharesmb              off                    default
zpool1  refquota              none                   default
zpool1  refreservation        none                   default
zpool1  primarycache          all                    default
zpool1  secondarycache        all                    default
zpool1  usedbysnapshots       0                      -
zpool1  usedbydataset         62.1K                  -
zpool1  usedbychildren        7.95T                  -
zpool1  usedbyrefreservation  0                      -
zpool1  logbias               latency                default
zpool1  dedup                 on                     local
zpool1  mlslabel              none                   default

Another issue is the scrubbing taking forever. I started a scrub last week and 
I believe there is a bug around that already with the de-duplication feature. 
The system halted before the scrubbing as well so I don't think that is the 
issue. I can't stop the scrubbing either - the command just hangs.

The hardware is:

System Configuration: Supermicro X8DT3
BIOS Configuration: American Megatrends Inc. 080015  09/24/2009
BMC Configuration: IPMI 1.5 (KCS: Keyboard Controller Style)

==== Processor Sockets ====================================

Version                          Location Tag
-------------------------------- --------------------------
Intel(R) Xeon(R) CPU           E5520  @ 2.27GHz CPU 2
Intel(R) Xeon(R) CPU           E5520  @ 2.27GHz CPU 1

==== Memory Device Sockets ================================

Type        Status Set Device Locator      Bank Locator
----------- ------ --- ------------------- ----------------
other       in use 0   P1-DIMM1A           BANK0
other       empty  0   P1-DIMM1B           BANK1
other       in use 0   P1-DIMM2A           BANK2
other       empty  0   P1-DIMM2B           BANK3
other       in use 0   P1-DIMM3A           BANK4
other       empty  0   P1-DIMM3B           BANK5
other       in use 0   P2-DIMM1A           BANK6
other       empty  0   P2-DIMM1B           BANK7
other       in use 0   P2-DIMM2A           BANK8
other       empty  0   P2-DIMM2B           BANK9
other       in use 0   P2-DIMM3A           BANK10
other       empty  0   P2-DIMM3B           BANK11

==== On-Board Devices =====================================

==== Upgradeable Slots ====================================

ID  Status    Type             Description
--- --------- ---------------- ----------------------------
1   available PCI              PCI#1
2   in use    PCI Express      PCI-E#2
3   available PCI              PCI#3
4   in use    PCI Express      PCI#4
5   available PCI Express      PCI-E#5
6   in use    PCI Express      PCI-E#6

All 12 2TB SATA disks in the zpool1 array are pass-through on a SAS backplane 
connected to a Areca 1640 controller, the spare sits on another Areca 1640 
controller. The root pool is on the same controller with 2 Seagate 500GB disks 
in RAID1 (on the controller).

I don't know what other information you would need to troubleshoot this but I 
don't think this is normal behavior, even for a developers release.
-- 
This message posted from opensolaris.org
-------------- next part --------------
A non-text attachment was scrubbed...
Name: solaris-crash.jpg
Type: image/jpeg
Size: 154960 bytes
Desc: not available
URL: 
<http://mail.opensolaris.org/pipermail/opensolaris-help/attachments/20100215/2cdb54d8/attachment-0001.jpg>

Reply via email to