[zfs-discuss] Dirves going offline in Zpool

2013-03-23 Thread Ram Chander
Hi,

I have Dell md1200 connected to two heads ( Dell R710 ). The heads have
Perc H800 card and drives are configured in Raid0 ( Virtual Disk) in the
RAID controller.

One of the drives had crashed and is replaced by a spare. Resilvering was
triggered but fails to complete due to drives going offline.  I have to
reboot the head ( R710) and drives comes online. This happened  repeatedly
when resilver was 4% done, and again was rebooted ,  again hung at 27%
done, etc.

The issues happens with both Solaris11.1/ Omnios.
Its a 100Tb  pool with 69Tb used. I have critical data and cant afford loss
of data.
Can I recover the data anyway ( atleast partially ) ?

I had verified there is no hardware issue with H800 and also upgraded the
firmware for H800. The issue happens with both the heads.

Current OS: Solaris 11.1

Mar 22 21:47:55 solaris scsi: [ID 107833 kern.warning] WARNING: /pci@0
,0/pci8086,340e@7/pci1028,1f15@0/sd@12,0 (sd26):
Mar 22 21:47:55 solarisCommand failed to complete...Device is gone
Mar 22 21:47:55 solaris scsi: [ID 107833 kern.warning] WARNING: /pci@0
,0/pci8086,340e@7/pci1028,1f15@0/sd@c,0 (sd20):
Mar 22 21:47:55 solarisCommand failed to complete...Device is gone
Mar 22 21:47:55 solaris scsi: [ID 107833 kern.warning] WARNING: /pci@0
,0/pci8086,340e@7/pci1028,1f15@0/sd@18,0 (sd32):
Mar 22 21:47:55 solarisCommand failed to complete...Device is gone
Mar 22 21:47:55 solaris scsi: [ID 107833 kern.warning] WARNING: /pci@0
,0/pci8086,340e@7/pci1028,1f15@0/sd@1c,0 (sd36):
Mar 22 21:47:55 solarisCommand failed to complete...Device is gone
Mar 22 21:47:55 solaris scsi: [ID 107833 kern.warning] WARNING: /pci@0
,0/pci8086,340e@7/pci1028,1f15@0/sd@1b,0 (sd35):
Mar 22 21:47:55 solarisCommand failed to complete...Device is gone
Mar 22 21:47:55 solaris scsi: [ID 107833 kern.warning] WARNING: /pci@0
,0/pci8086,340e@7/pci1028,1f15@0/sd@1e,0 (sd38):
Mar 22 21:47:55 solarisCommand failed to complete...Device is gone
Mar 22 21:47:55 solaris scsi: [ID 107833 kern.warning] WARNING: /pci@0
,0/pci8086,340e@7/pci1028,1f15@0/sd@19,0 (sd33):
Mar 22 21:47:55 solarisCommand failed to complete...Device is gone
Mar 22 21:47:55 solaris scsi: [ID 107833 kern.warning] WARNING: /pci@0
,0/pci8086,340e@7/pci1028,1f15@0/sd@1d,0 (sd37):
Mar 22 21:47:55 solarisCommand failed to complete...Device is gone
Mar 22 21:47:55 solaris scsi: [ID 107833 kern.warning] WARNING: /pci@0
,0/pci8086,340e@7/pci1028,1f15@0/sd@27,0 (sd47):
Mar 22 21:47:55 solarisCommand failed to complete...Device is gone
Mar 22 21:47:55 solaris scsi: [ID 107833 kern.warning] WARNING: /pci@0
,0/pci8086,340e@7/pci1028,1f15@0/sd@26,0 (sd46):
Mar 22 21:47:55 solarisCommand failed to complete...Device is gone

# zpool status -v

  pool: test
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
continue to function in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Wed Mar 20 19:13:40 2013
27.4T scanned out of 69.6T at 183M/s, 67h11m to go
2.43T resilvered, 39.32% done
config:

NAMESTATE READ WRITE CKSUM
test  DEGRADED 0 0 0
  raidz1-0  DEGRADED 0 0 0
c8t0d0  ONLINE   0 0 0
c8t1d0  DEGRADED 0 0 0
c8t2d0  DEGRADED 0 0 0
c8t3d0  ONLINE   0 0 0
spare-4 DEGRADED 0 0 0
  12459181442598970150  UNAVAIL  0 0 0
  c8t45d0   DEGRADED 0 0 0
(resilvering)
  raidz1-1  ONLINE   0 0 0
c8t5d0  ONLINE   0 0 0
c8t6d0  ONLINE   0 0 0
c8t7d0  ONLINE   0 0 0
c8t8d0  ONLINE   0 0 0
c8t9d0  ONLINE   0 0 0
  raidz1-3  DEGRADED 0 0 0
c8t12d0 ONLINE   0 0 0
c8t13d0 ONLINE   0 0 0
c8t14d0 ONLINE   0 0 0
c8t15d0 DEGRADED 0 0 0
c8t16d0 ONLINE   0 0 0
c8t17d0 ONLINE   0 0 0
c8t18d0 ONLINE   0 0 0
c8t19d0 ONLINE   0 0 0
c8t20d0 DEGRADED 0 0 0
c8t21d0 DEGRADED 0 0 0
spare-10DEGRADED 0 0 0
  c8t22d0   DEGRADED 0 0 0
  c8t47d0   DEGRADED 0 

[zfs-discuss] SSD for L2arc

2013-03-21 Thread Ram Chander
Hi,

Can I know how to configure a SSD to be used for L2arc ? Basically I want
to improve read performance.
To increase write performance, will SSD for Zil help ? As I read on forums,
Zil is only used for mysql/transaction based writes. I have regular writes
only.

Thanks.

Regards,
Ram
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Slow zfs writes

2013-02-11 Thread Ram Chander
Hi,

My OmniOS host  is expreiencing slow zfs writes ( around 30 times slower ).
iostat reports below error though pool is healthy. This is happening in
past 4 days though no change was done to system. Is the hard disks faulty ?
Please help.

# zpool status -v
root@host:~# zpool status -v
  pool: test
 state: ONLINE
status: The pool is formatted using a legacy on-disk format.  The pool can
still be used, but some features are unavailable.
action: Upgrade the pool using 'zpool upgrade'.  Once this is done, the
pool will no longer be accessible on software that does not support
feature flags.
config:

NAME STATE READ WRITE CKSUM
test   ONLINE   0 0 0
  raidz1-0   ONLINE   0 0 0
c2t0d0   ONLINE   0 0 0
c2t1d0   ONLINE   0 0 0
c2t2d0   ONLINE   0 0 0
c2t3d0   ONLINE   0 0 0
c2t4d0   ONLINE   0 0 0
  raidz1-1   ONLINE   0 0 0
c2t5d0   ONLINE   0 0 0
c2t6d0   ONLINE   0 0 0
c2t7d0   ONLINE   0 0 0
c2t8d0   ONLINE   0 0 0
c2t9d0   ONLINE   0 0 0
  raidz1-3   ONLINE   0 0 0
c2t12d0  ONLINE   0 0 0
c2t13d0  ONLINE   0 0 0
c2t14d0  ONLINE   0 0 0
c2t15d0  ONLINE   0 0 0
c2t16d0  ONLINE   0 0 0
c2t17d0  ONLINE   0 0 0
c2t18d0  ONLINE   0 0 0
c2t19d0  ONLINE   0 0 0
c2t20d0  ONLINE   0 0 0
c2t21d0  ONLINE   0 0 0
c2t22d0  ONLINE   0 0 0
c2t23d0  ONLINE   0 0 0
  raidz1-4   ONLINE   0 0 0
c2t24d0  ONLINE   0 0 0
c2t25d0  ONLINE   0 0 0
c2t26d0  ONLINE   0 0 0
c2t27d0  ONLINE   0 0 0
c2t28d0  ONLINE   0 0 0
c2t29d0  ONLINE   0 0 0
c2t30d0  ONLINE   0 0 0
  raidz1-5   ONLINE   0 0 0
c2t31d0  ONLINE   0 0 0
c2t32d0  ONLINE   0 0 0
c2t33d0  ONLINE   0 0 0
c2t34d0  ONLINE   0 0 0
c2t35d0  ONLINE   0 0 0
c2t36d0  ONLINE   0 0 0
c2t37d0  ONLINE   0 0 0
  raidz1-6   ONLINE   0 0 0
c2t38d0  ONLINE   0 0 0
c2t39d0  ONLINE   0 0 0
c2t40d0  ONLINE   0 0 0
c2t41d0  ONLINE   0 0 0
c2t42d0  ONLINE   0 0 0
c2t43d0  ONLINE   0 0 0
c2t44d0  ONLINE   0 0 0
spares
  c5t10d0AVAIL
  c5t11d0AVAIL
  c2t45d0AVAIL
  c2t46d0AVAIL
  c2t47d0AVAIL



# iostat -En

c4t0d0   Soft Errors: 0 Hard Errors: 5 Transport Errors: 0
Vendor: iDRACProduct: Virtual CD   Revision: 0323 Serial No:
Size: 0.00GB 0 bytes
Media Error: 0 Device Not Ready: 5 No Device: 0 Recoverable: 0
Illegal Request: 1 Predictive Failure Analysis: 0
c3t0d0   Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: iDRACProduct: LCDRIVE  Revision: 0323 Serial No:
Size: 0.00GB 0 bytes
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 0 Predictive Failure Analysis: 0
c4t0d1   Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: iDRACProduct: Virtual Floppy   Revision: 0323 Serial No:
Size: 0.00GB 0 bytes
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0


root@host:~# fmadm faulty
---   --
-
TIMEEVENT-ID  MSG-ID
SEVERITY
---   --
-
Jan 05 08:21:09 7af1ab3c-83c2-602d-d4b9-f9040db6944a  ZFS-8000-HC
Major

Host: host
Platform: PowerEdge-R810
Product_sn  :

Fault class : fault.fs.zfs.io_failure_wait
Affects : zfs://pool=test
  faulted but still in service
Problem in  : zfs://pool=test
  faulted but still in service

Description : The ZFS pool has experienced currently unrecoverable I/O
failures.  Refer to http://illumos.org/msg/ZFS-8000-HCfor
  more information.

Response: No automated response will be taken.

Impact  : Read and write I/Os cannot be serviced.

Action  : Make sure the affected devices are connected, then run
'zpool clear'.

Regards,
Ram

Re: [zfs-discuss] Slow zfs writes

2013-02-11 Thread Ram Chander
Hi Roy,
You are right. So it looks like re-distribution issue. Initially  there
were two Vdev with 24 disks ( disk 0-23 ) for close to year. After which
which we added 24 more disks and created additional vdevs. The initial
vdevs are filled up and so write speed declined. Now  how to find files
that are present in a Vdev or a disk. That way I can remove and re-copy
back to distribute data. Any other way to solve this ?

Total capacity of pool - 98Tb
Used - 44Tb
Free - 54 Tb

root@host:# zpool iostat -v
capacity operationsbandwidth
pool alloc   free   read  write   read  write
---  -  -  -  -  -  -
test   54.0T  62.7T 52  1.12K  2.16M  5.78M
  raidz1 11.2T  2.41T 13 30   176K   146K
c2t0d0   -  -  5 18  42.1K  39.0K
c2t1d0   -  -  5 18  42.2K  39.0K
c2t2d0   -  -  5 18  42.5K  39.0K
c2t3d0   -  -  5 18  42.9K  39.0K
c2t4d0   -  -  5 18  42.6K  39.0K
  raidz1 13.3T   308G 13100   213K   521K
c2t5d0   -  -  5 94  50.8K   135K
c2t6d0   -  -  5 94  51.0K   135K
c2t7d0   -  -  5 94  50.8K   135K
c2t8d0   -  -  5 94  51.1K   135K
c2t9d0   -  -  5 94  51.1K   135K
  raidz1 13.4T  19.1T  9455   743K  2.31M
c2t12d0  -  -  3137  69.6K   235K
c2t13d0  -  -  3129  69.4K   227K
c2t14d0  -  -  3139  69.6K   235K
c2t15d0  -  -  3131  69.6K   227K
c2t16d0  -  -  3141  69.6K   235K
c2t17d0  -  -  3132  69.5K   227K
c2t18d0  -  -  3142  69.6K   235K
c2t19d0  -  -  3133  69.6K   227K
c2t20d0  -  -  3143  69.6K   235K
c2t21d0  -  -  3133  69.5K   227K
c2t22d0  -  -  3143  69.6K   235K
c2t23d0  -  -  3133  69.5K   227K
  raidz1 2.44T  16.6T  5103   327K   485K
c2t24d0  -  -  2 48  50.8K  87.4K
c2t25d0  -  -  2 49  50.7K  87.4K
c2t26d0  -  -  2 49  50.8K  87.3K
c2t27d0  -  -  2 49  50.8K  87.3K
c2t28d0  -  -  2 49  50.8K  87.3K
c2t29d0  -  -  2 49  50.8K  87.3K
c2t30d0  -  -  2 49  50.8K  87.3K
  raidz1 8.18T  10.8T  5295   374K  1.54M
c2t31d0  -  -  2131  58.2K   279K
c2t32d0  -  -  2131  58.1K   279K
c2t33d0  -  -  2131  58.2K   279K
c2t34d0  -  -  2132  58.2K   279K
c2t35d0  -  -  2132  58.1K   279K
c2t36d0  -  -  2133  58.3K   279K
c2t37d0  -  -  2133  58.2K   279K
  raidz1 5.42T  13.6T  5163   383K   823K
c2t38d0  -  -  2 61  59.4K   146K
c2t39d0  -  -  2 61  59.3K   146K
c2t40d0  -  -  2 61  59.4K   146K
c2t41d0  -  -  2 61  59.4K   146K
c2t42d0  -  -  2 61  59.3K   146K
c2t43d0  -  -  2 62  59.2K   146K
c2t44d0  -  -  2 62  59.3K   146K


On Mon, Feb 11, 2013 at 10:23 PM, Roy Sigurd Karlsbakk 
r...@karlsbakk.netwrote:


 root@host:~# fmadm faulty
 ---   --
 -
 TIMEEVENT-ID  MSG-ID
 SEVERITY
 ---   --
 -
 Jan 05 08:21:09 7af1ab3c-83c2-602d-d4b9-f9040db6944a  ZFS-8000-HC
 Major

 Host: host
 Platform: PowerEdge-R810
 Product_sn  :

 Fault class : fault.fs.zfs.io_failure_wait
 Affects : zfs://pool=test
   faulted but still in service
 Problem in  : zfs://pool=test
   faulted but still in service

 Description : The ZFS pool has experienced currently unrecoverable I/O
 failures.  Refer to http://illumos.org/msg/ZFS-8000-HCfor
   more information.

 Response: No automated response will be taken.

 Impact  : Read and write I/Os cannot be serviced.

 Action  : Make sure the affected devices are connected, then run
 'zpool clear'.
 --

 The pool looks healthy to me, but it it isn't very well balanced. Have you
 been adding new VDEVs on your way to grow it? Check if of the VDEVs are
 fuller than others. I don't have an OI/IllumOS system available ATM, but
 IIRC this can be done with iostat -v. Older versions of ZFS striped to all
 VDEVs regardless to fill, which slowed down the write speeds rather
 horribly if some VDEVs were full (90%). This shouldn't be the case with
 OmniOS, but it *may* be the case with an old zpool version. I don't know.

 I'd check fill 

[zfs-discuss] Bp rewrite

2013-02-11 Thread Ram Chander
Hi,

Anyone knows if there is any progress on bp_rewrite ? Its much awaited to
solve re-distribution issue, and moving vdevs.

Regards,
Ram
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss