I create zpools via iscsi targets similar to this. At first I used the 
on-board Nvidia SATA ports on a Tyan S2927, but had random disk problems 
like you are describing. I switched to using SuperMicro AOC-SAT2-MV8 or 
AOC-USAS-L8i and haven't had a problem since.

Are your 1TB disks made by Seagate? If they are Barracuda 7200.11 you 
should upgrade the drive firmware before all of your disks stop functioning.

kristof wrote:
> I'm trying to setup a redundant zpool using 2 servers and opensolaris b105 
> with comstar iscsi.
> 
> Here are my setup details:
> 
> 2 Servers each:
> - tyan S2925
> - 6 x 1TB disks
> - 2 onboard nge nics
> - 1 PCIE IB card: MHEA28-1TC
> 
> partitionlayout:
> 
> * Id    Act  Bhead  Bsect  Bcyl    Ehead  Esect  Ecyl    Rsect      Numsect
>   191   128  0      1      2       254    63     1023    32130      58605120  
>   191   0    254    63     1023    254    63     1023    58637250   625121280 
>   191   0    254    63     1023    254    63     1023    683758530  1250242560
> 
> - First primary partition (disk 1 & 2) used for rpool
> - Second primary partition is exposed via iscsi (comstar)
> - Third partition is not used so far
> 
> r...@comstar1:~/iser# stmfadm list-lu -v
> LU Name: 600144F05850C200000049884CF70001
>     Operational Status: Online
>     Provider Name     : sbd
>     Alias             : /dev/rdsk/c1t0d0p2
>     View Entry Count  : 1
> LU Name: 600144F05850C200000049884CFB0002
>     Operational Status: Online
>     Provider Name     : sbd
>     Alias             : /dev/rdsk/c1t1d0p2
>     View Entry Count  : 1
> LU Name: 600144F05850C200000049884D060003
>     Operational Status: Online
>     Provider Name     : sbd
>     Alias             : /dev/rdsk/c2t0d0p2
>     View Entry Count  : 1
> LU Name: 600144F05850C200000049884D090004
>     Operational Status: Online
>     Provider Name     : sbd
>     Alias             : /dev/rdsk/c2t1d0p2
>     View Entry Count  : 1
> LU Name: 600144F05850C200000049884D100005
>     Operational Status: Online
>     Provider Name     : sbd
>     Alias             : /dev/rdsk/c5t0d0p2
>     View Entry Count  : 1
> LU Name: 600144F05850C200000049884D130006
>     Operational Status: Online
>     Provider Name     : sbd
>     Alias             : /dev/rdsk/c5t1d0p2
>     View Entry Count  : 1
> 
> LU Name: 600144F0B2174500000049884D550002
>     Operational Status: Online
>     Provider Name     : sbd
>     Alias             : /dev/rdsk/c1t0d0p2
>     View Entry Count  : 1
> LU Name: 600144F0B2174500000049884D5F0003
>     Operational Status: Online
>     Provider Name     : sbd
>     Alias             : /dev/rdsk/c1t1d0p2
>     View Entry Count  : 1
> LU Name: 600144F0B2174500000049884E300004
>     Operational Status: Online
>     Provider Name     : sbd
>     Alias             : /dev/rdsk/c2t0d0p2
>     View Entry Count  : 1
> LU Name: 600144F0B2174500000049884E330005
>     Operational Status: Online
>     Provider Name     : sbd
>     Alias             : /dev/rdsk/c2t1d0p2
>     View Entry Count  : 1
> LU Name: 600144F0B2174500000049884E380006
>     Operational Status: Online
>     Provider Name     : sbd
>     Alias             : /dev/rdsk/c3t0d0p2
>     View Entry Count  : 1
> LU Name: 600144F0B2174500000049884E3A0007
>     Operational Status: Online
>     Provider Name     : sbd
>     Alias             : /dev/rdsk/c3t1d0p2
>     View Entry Count  : 1
> 
> On both servers I created a target group & host group:
> 
> r...@comstar1:~/iser# stmfadm list-tg -v
> Target Group: comstarcluster
>         Member: iqn.1986-03.com.sun:02:72169083-d7a2-cf5f-8b5a-f253fca09ad3
> 
> Host Group: cluster
>         Member: iqn.1986-03.com.sun:01:e00000000000.492d42bc
>         Member: iqn.1986-03.com.sun:01:e00000000000.492d42bd
> 
> Target Group: comstarcluster
>         Member: iqn.1986-03.com.sun:02:687afe9c-97fb-6733-896d-f4fe742ace59
> 
> r...@comstar2:~# stmfadm list-hg -v
> Host Group: cluster
>         Member: iqn.1986-03.com.sun:01:e00000000000.492d42bc
>         Member: iqn.1986-03.com.sun:01:e00000000000.492d42bd
> 
> Then I added a view per server (6 LUNS per view)
> 
> Finally I configured static iscsi configuration on the server: comstar1 and 
> created a mirrored zpool:
> 
> r...@comstar1:~/iser# iscsiadm list static-config
> 
> Static Configuration Target: 
> iqn.1986-03.com.sun:02:687afe9c-97fb-6733-896d-f4fe742ace59,192.168.100.2:3260
> 
> Static Configuration Target: 
> iqn.1986-03.com.sun:02:687afe9c-97fb-6733-896d-f4fe742ace59,192.168.101.2:3260
> 
> Static Configuration Target: 
> iqn.1986-03.com.sun:02:72169083-d7a2-cf5f-8b5a-f253fca09ad3,127.0.0.1:3260
> 
> - 192.168.100.2 is ibd0 on server comstar2
> - 192.168.101.2 is ibd1 on server comstar2
> 
> zpool create storagepoolb mirror c3t600144F05850C200000049884CF70001d0 
> c3t600144F0B2174500000049884D550002d0 mirror 
> c3t600144F05850C200000049884CFB0002d0 c3t600144F0B2174500000049884D5F0003d0 
> mirror c3t600144F05850C200000049884D060003d0 
> c3t600144F0B2174500000049884E300004d0 mirror 
> c3t600144F05850C200000049884D090004d0 c3t600144F0B2174500000049884E330005d0 
> mirror c3t600144F05850C200000049884D100005d0 
> c3t600144F0B2174500000049884E380006d0 mirror 
> c3t600144F05850C200000049884D130006d0 c3t600144F0B2174500000049884E3A0007d0
> 
>   pool: storagepoolb
>  state: ONLINE
>  scrub: none requested
> config:
> 
>         NAME                                       STATE     READ WRITE CKSUM
>         storagepoolb                               ONLINE       0     0     0
>           mirror                                   ONLINE       0     0     0
>             c3t600144F05850C200000049884CF70001d0  ONLINE       0     0     0
>             c3t600144F0B2174500000049884D550002d0  ONLINE       0     0     0
>           mirror                                   ONLINE       0     0     0
>             c3t600144F05850C200000049884CFB0002d0  ONLINE       0     0     0
>             c3t600144F0B2174500000049884D5F0003d0  ONLINE       0     0     0
>           mirror                                   ONLINE       0     0     0
>             c3t600144F05850C200000049884D060003d0  ONLINE       0     0     0
>             c3t600144F0B2174500000049884E300004d0  ONLINE       0     0     0
>           mirror                                   ONLINE       0     0     0
>             c3t600144F05850C200000049884D090004d0  ONLINE       0     0     0
>             c3t600144F0B2174500000049884E330005d0  ONLINE       0     0     0
>           mirror                                   ONLINE       0     0     0
>             c3t600144F05850C200000049884D100005d0  ONLINE       0     0     0
>             c3t600144F0B2174500000049884E380006d0  ONLINE       0     0     0
>           mirror                                   ONLINE       0     0     0
>             c3t600144F05850C200000049884D130006d0  ONLINE       0     0     0
>             c3t600144F0B2174500000049884E3A0007d0  ONLINE       0     0     0
> 
> 
> Then  I created some filesystems and volumes. 
> 
> storagepoolb               4.87G  1.71T    21K  /storagepoolb
> storagepoolb/clients         18K  1.71T    18K  /storagepoolb/clients
> storagepoolb/images        4.87G  1.71T    18K  /storagepoolb/images
> storagepoolb/images/vista    16K  1.71T    16K  -
> storagepoolb/images/xp     4.87G  1.71T  4.87G  -
> 
> I exposed storagepoolb/images/xp via iscsi (comstar) and connected via the 
> same server, so i could convert my iscsiboot-image (VDI file) to the raw 
> device.
> 
> So far so good. Now I tried to boot my thinclient using this exposed volume.
> 
> The boot started but then after some time it was "hanging". I checked the 
> logs files on server comstar 1 and found the following error messages:
> 
> Feb  3 18:21:59 comstar1 scsi: [ID 243001 kern.warning] WARNING: /scsi_vhci 
> (scsi_vhci0):
> Feb  3 18:21:59 comstar1        
> /scsi_vhci/d...@g600144f05850c200000049884d060003 (sd14): Command Timeout on 
> path /iscsi (iscsi0)
> Feb  3 18:21:59 comstar1 scsi: [ID 243001 kern.warning] WARNING: /scsi_vhci 
> (scsi_vhci0):
> Feb  3 18:21:59 comstar1        
> /scsi_vhci/d...@g600144f05850c200000049884d130006 (sd17): Command Timeout on 
> path /iscsi (iscsi0)
> Feb  3 18:21:59 comstar1 scsi_vhci: [ID 734749 kern.warning] WARNING: 
> vhci_scsi_reset 0x1
> Feb  3 18:21:59 comstar1 last message repeated 1 time
> Feb  3 18:22:09 comstar1 scsi: [ID 107833 kern.warning] WARNING: 
> /scsi_vhci/d...@g600144f05850c200000049884d130006 (sd17):
> Feb  3 18:22:09 comstar1        device busy too long
> Feb  3 18:22:09 comstar1 scsi: [ID 107833 kern.warning] WARNING: 
> /scsi_vhci/d...@g600144f05850c200000049884d060003 (sd14):
> Feb  3 18:22:09 comstar1        device busy too long
> Feb  3 18:22:14 comstar1 scsi_vhci: [ID 734749 kern.warning] WARNING: 
> vhci_scsi_reset 0x1
> Feb  3 18:23:15 comstar1 scsi: [ID 243001 kern.warning] WARNING: /scsi_vhci 
> (scsi_vhci0):
> Feb  3 18:23:15 comstar1        (sd17): path (iscsi0), reset 1 failed
> Feb  3 18:23:15 comstar1 scsi_vhci: [ID 734749 kern.warning] WARNING: 
> vhci_scsi_reset 0x0
> Feb  3 18:23:15 comstar1 scsi_vhci: [ID 734749 kern.warning] WARNING: 
> vhci_scsi_reset 0x1
> Feb  3 18:24:16 comstar1 scsi: [ID 243001 kern.warning] WARNING: /scsi_vhci 
> (scsi_vhci0):
> Feb  3 18:24:16 comstar1        (sd14): path (iscsi0), reset 1 failed
> Feb  3 18:24:16 comstar1 scsi_vhci: [ID 734749 kern.warning] WARNING: 
> vhci_scsi_reset 0x0
> Feb  3 18:24:21 comstar1 scsi: [ID 107833 kern.warning] WARNING: 
> /scsi_vhci/d...@g600144f05850c200000049884d130006 (sd17):
> Feb  3 18:24:21 comstar1        device busy too long
> Feb  3 18:24:21 comstar1 scsi_vhci: [ID 734749 kern.warning] WARNING: 
> vhci_scsi_reset 0x1
> Feb  3 18:24:21 comstar1 scsi: [ID 243001 kern.warning] WARNING: /scsi_vhci 
> (scsi_vhci0):
> Feb  3 18:24:21 comstar1        (sd17): path (iscsi0), reset 1 failed
> Feb  3 18:24:21 comstar1 scsi_vhci: [ID 734749 kern.warning] WARNING: 
> vhci_scsi_reset 0x0
> Feb  3 18:24:21 comstar1 scsi_vhci: [ID 734749 kern.warning] WARNING: 
> vhci_scsi_reset 0x1
> Feb  3 18:24:26 comstar1 scsi: [ID 107833 kern.warning] WARNING: 
> /scsi_vhci/d...@g600144f05850c200000049884d060003 (sd14):
> Feb  3 18:24:26 comstar1        device busy too long
> Feb  3 18:24:26 comstar1 scsi_vhci: [ID 734749 kern.warning] WARNING: 
> vhci_scsi_reset 0x1
> Feb  3 18:24:26 comstar1 scsi: [ID 243001 kern.warning] WARNING: /scsi_vhci 
> (scsi_vhci0):
> Feb  3 18:24:26 comstar1        (sd14): path (iscsi0), reset 1 failed
> Feb  3 18:24:26 comstar1 scsi_vhci: [ID 734749 kern.warning] WARNING: 
> vhci_scsi_reset 0x0
> Feb  3 18:24:26 comstar1 scsi_vhci: [ID 734749 kern.warning] WARNING: 
> vhci_scsi_reset 0x1
> Feb  3 18:24:31 comstar1 scsi: [ID 107833 kern.warning] WARNING: 
> /scsi_vhci/d...@g600144f05850c200000049884d130006 (sd17):
> Feb  3 18:24:31 comstar1        device busy too long
> 
> 
> Feb 03 2009 18:22:09.990389013 ereport.io.scsi.cmd.disk.dev.rqs.derr
> nvlist version: 0
>         class = ereport.io.scsi.cmd.disk.dev.rqs.derr
>         ena = 0xc68a392180200001
>         detector = (embedded nvlist)
>         nvlist version: 0
>                 version = 0x0
>                 scheme = dev
>                 device-path = 
> /iscsi/[email protected]:02:72169083-d7a2-cf5f-8b5a-f253fca09ad30001,2
>                 devid = id1,s...@n600144f05850c200000049884d060003
>         (end detector)
> 
>         driver-assessment = retry
>         op-code = 0x28
>         cdb = 0x28 0x0 0x25 0x42 0x53 0x0 0x0 0x0 0x10 0x0
>         pkt-reason = 0x0
>         pkt-state = 0x3f
>         pkt-stats = 0x0
>         stat-code = 0x2
>         key = 0x6
>         asc = 0x29
>         ascq = 0x0
>         sense-data = 0x70 0x0 0x6 0x0 0x0 0x0 0x0 0xa 0x0 0x0 0x0 0x0 0x29 
> 0x0 0x0 0x0 0x0 0x0 0x0 0x0
>         __ttl = 0x1
>         __tod = 0x49887d41 0x3b082315
> 
> Feb 03 2009 18:22:09.990389005 ereport.io.scsi.cmd.disk.recovered
> nvlist version: 0
>         class = ereport.io.scsi.cmd.disk.recovered
>         ena = 0xc68a392180200001
>         detector = (embedded nvlist)
>         nvlist version: 0
>                 version = 0x0
>                 scheme = dev
>                 device-path = 
> /iscsi/[email protected]:02:72169083-d7a2-cf5f-8b5a-f253fca09ad30001,2
>                 devid = id1,s...@n600144f05850c200000049884d060003
>         (end detector)
> 
>         driver-assessment = recovered
>         op-code = 0x28
>         cdb = 0x28 0x0 0x25 0x42 0x53 0x0 0x0 0x0 0x10 0x0
>         pkt-reason = 0x0
>         pkt-state = 0x1f
>         pkt-stats = 0x0
>         __ttl = 0x1
>         __tod = 0x49887d41 0x3b08230d
> 
> Feb 03 2009 18:22:09.990388931 ereport.io.scsi.cmd.disk.recovered
> nvlist version: 0
>         class = ereport.io.scsi.cmd.disk.recovered
>         ena = 0xc68a392180200001
>         detector = (embedded nvlist)
>         nvlist version: 0
>                 version = 0x0
>                 scheme = dev
>                 device-path = 
> /iscsi/[email protected]:02:72169083-d7a2-cf5f-8b5a-f253fca09ad30001,2
>                 devid = id1,s...@n600144f05850c200000049884d060003
>         (end detector)
> 
>         driver-assessment = recovered
>         op-code = 0x28
>         cdb = 0x28 0x0 0x0 0x0 0x3 0x0 0x0 0x0 0x10 0x0
>         pkt-reason = 0x0
>         pkt-state = 0x1f
>         pkt-stats = 0x0
>         __ttl = 0x1
>         __tod = 0x49887d41 0x3b0822c3
> 
> Feb 03 2009 18:24:37.976452834 ereport.fs.zfs.io
> nvlist version: 0
>         class = ereport.fs.zfs.io
>         ena = 0xc8b183a3b8100001
>         detector = (embedded nvlist)
>         nvlist version: 0
>                 version = 0x0
>                 scheme = zfs
>                 pool = 0x8b47077e6fdf66ba
>                 vdev = 0x9ca5aa13d8563613
>         (end detector)
> 
>         pool = storagepoolb
>         pool_guid = 0x8b47077e6fdf66ba
>         pool_context = 0
>         pool_failmode = wait
>         vdev_guid = 0x9ca5aa13d8563613
>         vdev_type = disk
>         vdev_path = /dev/dsk/c3t600144F05850C200000049884D060003d0s0
>         vdev_devid = id1,s...@n600144f05850c200000049884d060003/a
>         parent_guid = 0x78f0b1a8acd2242e
>         parent_type = mirror
>         zio_err = 5
>         zio_offset = 0x10ca000
>         zio_size = 0x2000
>         zio_objset = 0x2f
>         zio_object = 0x1
>         zio_level = 0
>         zio_blkid = 0x2458
>         __ttl = 0x1
>         __tod = 0x49887dd5 0x3a337ce2
> 
> Also zpool status was showing errors:
> 
> pool: storagepoolb
>  state: ONLINE
> status: One or more devices has experienced an unrecoverable error.  An
>         attempt was made to correct the error.  Applications are unaffected.
> action: Determine if the device needs to be replaced, and clear the errors
>         using 'zpool clear' or replace the device with 'zpool replace'.
>    see: http://www.sun.com/msg/ZFS-8000-9P
>  scrub: none requested
> config:
> 
>         NAME                                       STATE     READ WRITE CKSUM
>         storagepoolb                               ONLINE       0     0     0
>           mirror                                   ONLINE       0     0     0
>             c3t600144F05850C200000049884CF70001d0  ONLINE       0     0     0
>             c3t600144F0B2174500000049884D550002d0  ONLINE       0     0     0
>           mirror                                   ONLINE       0     0     0
>             c3t600144F05850C200000049884CFB0002d0  ONLINE       0     0     0
>             c3t600144F0B2174500000049884D5F0003d0  ONLINE       0     0     0
>           mirror                                   ONLINE       0     0     0
>             c3t600144F05850C200000049884D060003d0  ONLINE       2     0     0
>             c3t600144F0B2174500000049884E300004d0  ONLINE       0     0     0
>           mirror                                   ONLINE       0     0     0
>             c3t600144F05850C200000049884D090004d0  ONLINE       0     0     0
>             c3t600144F0B2174500000049884E330005d0  ONLINE       0     0     0
>           mirror                                   ONLINE       0     0     0
>             c3t600144F05850C200000049884D100005d0  ONLINE       0     0     0
>             c3t600144F0B2174500000049884E380006d0  ONLINE       0     0     0
>           mirror                                   ONLINE       0     0     0
>             c3t600144F05850C200000049884D130006d0  ONLINE       2     0     0
>             c3t600144F0B2174500000049884E3A0007d0  ONLINE       0     0     0
> 
> errors: No known data errors
> 
> Can someone tell me what is going wrong?
> 
> Thanks in advance
> 
> Kristof
_______________________________________________
storage-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/storage-discuss

Reply via email to