Re: [zfs-discuss] External SATA drive enclosures + ZFS?

2011-02-25 Thread Mike Tancsa
On 2/25/2011 7:34 PM, Rich Teer wrote:
 
 One product that seems to fit the bill is the StarTech.com S352U2RER,
 an external dual SATA disk enclosure with USB and eSATA connectivity
 (I'd be using the USB port).  Here's a link to the specific product
 I'm considering:
 
 http://ca.startech.com/product/S352U2RER-35in-eSATA-USB-Dual-SATA-Hot-Swap-External-RAID-Hard-Drive-Enclosure

I have had mixed results with their 4 bay version.  When they work, they
are great, but we have had a number of DOA/almost DOA units.  I have had
good luck with products from
http://www.addonics.com/
(They ship to Canada as well without issue)

Why use USB ? You wll get much better performance/throughput on eSata
(if you have good drivers of course). I use their sil3124 eSata
controller on FreeBSD as well as a number of PM units and they work great.

---Mike


-- 
---
Mike Tancsa, tel +1 519 651 3400
Sentex Communications, m...@sentex.net
Providing Internet services since 1994 www.sentex.net
Cambridge, Ontario Canada   http://www.tancsa.com/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] multiple disk failure (solved?)

2011-02-01 Thread Mike Tancsa
On 1/31/2011 4:19 PM, Mike Tancsa wrote:
 On 1/31/2011 3:14 PM, Cindy Swearingen wrote:
 Hi Mike,

 Yes, this is looking much better.

 Some combination of removing corrupted files indicated in the zpool
 status -v output, running zpool scrub and then zpool clear should
 resolve the corruption, but its depends on how bad the corruption is.

 First, I would try least destruction method: Try to remove the
 files listed below by using the rm command.

 This entry probably means that the metadata is corrupted or some
 other file (like a temp file) no longer exists:

 tank1/argus-data:0xc6
 
 
 Hi Cindy,
   I removed the files that were listed, and now I am left with
 
 errors: Permanent errors have been detected in the following files:
 
 tank1/argus-data:0xc5
 tank1/argus-data:0xc6
 tank1/argus-data:0xc7
 
 I have started a scrub
  scrub: scrub in progress for 0h48m, 10.90% done, 6h35m to go


Looks like that was it!  The scrub finished in the time it estimated and
that was all I needed to do. I did not have to to do zpool clear or any
other commands.  Is there anything beyond scrub to check the integrity
of the pool ?

0(offsite)# zpool status -v
  pool: tank1
 state: ONLINE
 scrub: scrub completed after 7h32m with 0 errors on Mon Jan 31 23:00:46
2011
config:

NAMESTATE READ WRITE CKSUM
tank1   ONLINE   0 0 0
  raidz1ONLINE   0 0 0
ad0 ONLINE   0 0 0
ad1 ONLINE   0 0 0
ad4 ONLINE   0 0 0
ad6 ONLINE   0 0 0
  raidz1ONLINE   0 0 0
ada0ONLINE   0 0 0
ada1ONLINE   0 0 0
ada2ONLINE   0 0 0
ada3ONLINE   0 0 0
  raidz1ONLINE   0 0 0
ada5ONLINE   0 0 0
ada8ONLINE   0 0 0
ada7ONLINE   0 0 0
ada6ONLINE   0 0 0

errors: No known data errors
0(offsite)#


---Mike
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] multiple disk failure (solved?)

2011-01-31 Thread Mike Tancsa
On 1/29/2011 6:18 PM, Richard Elling wrote:
 
 On Jan 29, 2011, at 12:58 PM, Mike Tancsa wrote:
 
 On 1/29/2011 12:57 PM, Richard Elling wrote:
 0(offsite)# zpool status
 pool: tank1
 state: UNAVAIL
 status: One or more devices could not be opened.  There are insufficient
   replicas for the pool to continue functioning.
 action: Attach the missing device and online it using 'zpool online'.
  see: http://www.sun.com/msg/ZFS-8000-3C
 scrub: none requested
 config:

   NAMESTATE READ WRITE CKSUM
   tank1   UNAVAIL  0 0 0  insufficient replicas
 raidz1ONLINE   0 0 0
   ad0 ONLINE   0 0 0
   ad1 ONLINE   0 0 0
   ad4 ONLINE   0 0 0
   ad6 ONLINE   0 0 0
 raidz1ONLINE   0 0 0
   ada4ONLINE   0 0 0
   ada5ONLINE   0 0 0
   ada6ONLINE   0 0 0
   ada7ONLINE   0 0 0
 raidz1UNAVAIL  0 0 0  insufficient replicas
   ada0UNAVAIL  0 0 0  cannot open
   ada1UNAVAIL  0 0 0  cannot open
   ada2UNAVAIL  0 0 0  cannot open
   ada3UNAVAIL  0 0 0  cannot open
 0(offsite)#

 This is usually easily solved without data loss by making the
 disks available again.  Can you read anything from the disks using
 any program?

 Thats the strange thing, the disks are readable.  The drive cage just
 reset a couple of times prior to the crash. But they seem OK now.  Same
 order as well.

 # camcontrol devlist
 WDC WD\021501FASR\25500W2B0 \200 0956  at scbus0 target 0 lun 0
 (pass0,ada0)
 WDC WD\021501FASR\25500W2B0 \200 05.01D\0205  at scbus0 target 1 lun 0
 (pass1,ada1)
 WDC WD\021501FASR\25500W2B0 \200 05.01D\0205  at scbus0 target 2 lun 0
 (pass2,ada2)
 WDC WD\021501FASR\25500W2B0 \200 05.01D\0205  at scbus0 target 3 lun 0
 (pass3,ada3)


 # dd if=/dev/ada2 of=/dev/null count=20 bs=1024
 20+0 records in
 20+0 records out
 20480 bytes transferred in 0.001634 secs (12534561 bytes/sec)
 0(offsite)#
 
 The next step is to run zdb -l and look for all 4 labels. Something like:
   zdb -l /dev/ada2
 
 If all 4 labels exist for each drive and appear intact, then look more closely
 at how the OS locates the vdevs. If you can't solve the UNAVAIL problem,
 you won't be able to import the pool.
  -- richard

On 1/29/2011 10:13 PM, James R. Van Artsdalen wrote:
 On 1/28/2011 4:46 PM, Mike Tancsa wrote:

 I had just added another set of disks to my zfs array. It looks like the
 drive cage with the new drives is faulty.  I had added a couple of files
 to the main pool, but not much.  Is there any way to restore the pool
 below ? I have a lot of files on ad0,1,4,6 and ada4,5,6,7 and perhaps
 one file on the new drives in the bad cage.

 Get another enclosure and verify it works OK.  Then move the disks from
 the suspect enclosure to the tested enclosure and try to import the pool.

 The problem may be cabling or the controller instead - you didn't
 specify how the disks were attached or which version of FreeBSD you're
 using.


First off thanks to all who responded on and offlist!

Good news (for me) it seems. New cage and all seems to be recognized
correctly.  The history is

...
2010-04-22.14:27:38 zpool add tank1 raidz /dev/ada4 /dev/ada5 /dev/ada6
/dev/ada7
2010-06-11.13:49:33 zfs create tank1/argus-data
2010-06-11.13:49:41 zfs create tank1/argus-data/previous
2010-06-11.13:50:38 zfs set compression=off tank1/argus-data
2010-08-06.12:20:59 zpool replace tank1 ad1 ad1
2010-09-16.10:17:51 zpool upgrade -a
2011-01-28.11:45:43 zpool add tank1 raidz /dev/ada0 /dev/ada1 /dev/ada2
/dev/ada3

FreeBSD RELENG_8 from last week, 8G of RAM, amd64.

 zpool status -v
  pool: tank1
 state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
 scrub: none requested
config:

NAMESTATE READ WRITE CKSUM
tank1   ONLINE   0 0 0
  raidz1ONLINE   0 0 0
ad0 ONLINE   0 0 0
ad1 ONLINE   0 0 0
ad4 ONLINE   0 0 0
ad6 ONLINE   0 0 0
  raidz1ONLINE   0 0 0
ada0ONLINE   0 0 0
ada1ONLINE   0 0 0
ada2ONLINE   0 0 0
ada3ONLINE   0 0 0
  raidz1ONLINE   0 0 0
ada5ONLINE   0 0 0
ada8ONLINE   0 0 0
ada7ONLINE   0 0 0
ada6ONLINE   0

Re: [zfs-discuss] multiple disk failure (solved?)

2011-01-31 Thread Mike Tancsa
On 1/31/2011 3:14 PM, Cindy Swearingen wrote:
 Hi Mike,
 
 Yes, this is looking much better.
 
 Some combination of removing corrupted files indicated in the zpool
 status -v output, running zpool scrub and then zpool clear should
 resolve the corruption, but its depends on how bad the corruption is.
 
 First, I would try least destruction method: Try to remove the
 files listed below by using the rm command.
 
 This entry probably means that the metadata is corrupted or some
 other file (like a temp file) no longer exists:
 
 tank1/argus-data:0xc6


Hi Cindy,
I removed the files that were listed, and now I am left with

errors: Permanent errors have been detected in the following files:

tank1/argus-data:0xc5
tank1/argus-data:0xc6
tank1/argus-data:0xc7

I have started a scrub
 scrub: scrub in progress for 0h48m, 10.90% done, 6h35m to go

I will report back once the scrub is done!

---Mike
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] multiple disk failure

2011-01-30 Thread Mike Tancsa
On 1/30/2011 12:39 AM, Richard Elling wrote:
 Hmmm, doesnt look good on any of the drives.
 
 I'm not sure of the way BSD enumerates devices.  Some clever person thought
 that hiding the partition or slice would be useful. I don't find it useful.  
 On a Solaris
 system, ZFS can show a disk something like c0t1d0, but that doesn't exist. The
 actual data is in slice 0, so you need to use c0t1d0s0 as the argument to zdb.

I think its the right syntax.  On the older drives,


0(offsite)# zdb -l /dev/ada0

LABEL 0

failed to unpack label 0

LABEL 1

failed to unpack label 1

LABEL 2

failed to unpack label 2

LABEL 3

failed to unpack label 3
0(offsite)# zdb -l /dev/ada4

LABEL 0

version=15
name='tank1'
state=0
txg=44593174
pool_guid=7336939736750289319
hostid=3221266864
hostname='offsite.sentex.ca'
top_guid=6980939370923808328
guid=16144392433229115618
vdev_tree
type='raidz'
id=1
guid=6980939370923808328
nparity=1
metaslab_array=38
metaslab_shift=35
ashift=9
asize=4000799784960
is_log=0
children[0]
type='disk'
id=0
guid=16144392433229115618
path='/dev/ada4'
whole_disk=0
DTL=341
children[1]
type='disk'
id=1
guid=1210677308003674848
path='/dev/ada5'
whole_disk=0
DTL=340
children[2]
type='disk'
id=2
guid=2517076601231706249
path='/dev/ada6'
whole_disk=0
DTL=339
children[3]
type='disk'
id=3
guid=16621760039941477713
path='/dev/ada7'
whole_disk=0
DTL=338

LABEL 1

version=15
name='tank1'
state=0
txg=44592523
pool_guid=7336939736750289319
hostid=3221266864
hostname='offsite.sentex.ca'
top_guid=6980939370923808328
guid=16144392433229115618
vdev_tree
type='raidz'
id=1
guid=6980939370923808328
nparity=1
metaslab_array=38
metaslab_shift=35
ashift=9
asize=4000799784960
is_log=0
children[0]
type='disk'
id=0
guid=16144392433229115618
path='/dev/ada4'
whole_disk=0
DTL=341
children[1]
type='disk'
id=1
guid=1210677308003674848
path='/dev/ada5'
whole_disk=0
DTL=340
children[2]
type='disk'
id=2
guid=2517076601231706249
path='/dev/ada6'
whole_disk=0
DTL=339
children[3]
type='disk'
id=3
guid=16621760039941477713
path='/dev/ada7'
whole_disk=0
DTL=338

LABEL 2

version=15
name='tank1'
state=0
txg=44593174
pool_guid=7336939736750289319
hostid=3221266864
hostname='offsite.sentex.ca'
top_guid=6980939370923808328
guid=16144392433229115618
vdev_tree
type='raidz'
id=1
guid=6980939370923808328
nparity=1
metaslab_array=38
metaslab_shift=35
ashift=9
asize=4000799784960
is_log=0
children[0]
type='disk'
id=0
guid=16144392433229115618
path='/dev/ada4'
whole_disk=0
DTL=341
children[1]
type='disk'
id=1
guid=1210677308003674848
path='/dev/ada5'
whole_disk=0
DTL=340
children[2]
type='disk'
id=2
guid=2517076601231706249
path='/dev/ada6'
whole_disk=0
DTL=339
children[3]
type='disk'
id=3
guid=16621760039941477713
path='/dev/ada7'
whole_disk=0
DTL=338

Re: [zfs-discuss] multiple disk failure

2011-01-29 Thread Mike Tancsa
On 1/29/2011 12:57 PM, Richard Elling wrote:
 0(offsite)# zpool status
  pool: tank1
 state: UNAVAIL
 status: One or more devices could not be opened.  There are insufficient
replicas for the pool to continue functioning.
 action: Attach the missing device and online it using 'zpool online'.
   see: http://www.sun.com/msg/ZFS-8000-3C
 scrub: none requested
 config:

NAMESTATE READ WRITE CKSUM
tank1   UNAVAIL  0 0 0  insufficient replicas
  raidz1ONLINE   0 0 0
ad0 ONLINE   0 0 0
ad1 ONLINE   0 0 0
ad4 ONLINE   0 0 0
ad6 ONLINE   0 0 0
  raidz1ONLINE   0 0 0
ada4ONLINE   0 0 0
ada5ONLINE   0 0 0
ada6ONLINE   0 0 0
ada7ONLINE   0 0 0
  raidz1UNAVAIL  0 0 0  insufficient replicas
ada0UNAVAIL  0 0 0  cannot open
ada1UNAVAIL  0 0 0  cannot open
ada2UNAVAIL  0 0 0  cannot open
ada3UNAVAIL  0 0 0  cannot open
 0(offsite)#
 
 This is usually easily solved without data loss by making the
 disks available again.  Can you read anything from the disks using
 any program?

Thats the strange thing, the disks are readable.  The drive cage just
reset a couple of times prior to the crash. But they seem OK now.  Same
order as well.

# camcontrol devlist
WDC WD\021501FASR\25500W2B0 \200 0956  at scbus0 target 0 lun 0
(pass0,ada0)
WDC WD\021501FASR\25500W2B0 \200 05.01D\0205  at scbus0 target 1 lun 0
(pass1,ada1)
WDC WD\021501FASR\25500W2B0 \200 05.01D\0205  at scbus0 target 2 lun 0
(pass2,ada2)
WDC WD\021501FASR\25500W2B0 \200 05.01D\0205  at scbus0 target 3 lun 0
(pass3,ada3)


# dd if=/dev/ada2 of=/dev/null count=20 bs=1024
20+0 records in
20+0 records out
20480 bytes transferred in 0.001634 secs (12534561 bytes/sec)
0(offsite)#

---Mike
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] multiple disk failure

2011-01-29 Thread Mike Tancsa
On 1/29/2011 11:38 AM, Edward Ned Harvey wrote:
 
 That is precisely the reason why you always want to spread your mirror/raidz
 devices across multiple controllers or chassis.  If you lose a controller or
 a whole chassis, you lose one device from each vdev, and you're able to
 continue production in a degraded state...


Thanks.  These are backups of backups. It would be nice to restore them
as it will take a while to sync up once again.  But if I need to start
fresh, is there a resource you can point me to with the current best
practices for laying out large storage like this ?  Its just for backups
of backups in a DR site

---Mike
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] multiple disk failure

2011-01-29 Thread Mike Tancsa
On 1/29/2011 6:18 PM, Richard Elling wrote:
 0(offsite)#
 
 The next step is to run zdb -l and look for all 4 labels. Something like:
   zdb -l /dev/ada2
 
 If all 4 labels exist for each drive and appear intact, then look more closely
 at how the OS locates the vdevs. If you can't solve the UNAVAIL problem,
 you won't be able to import the pool.



Hmmm, doesnt look good on any of the drives.  Before I give up, I will
try the drives in a different cage Monday. Unfortunately, its a 150km
away from me at our DR site


# zdb -l /dev/ada0

LABEL 0

failed to unpack label 0

LABEL 1

failed to unpack label 1

LABEL 2

failed to unpack label 2

LABEL 3

failed to unpack label 3
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] multiple disk failure

2011-01-28 Thread Mike Tancsa
Hi,
I am using FreeBSD 8.2 and went to add 4 new disks today to expand my
offsite storage.  All was working fine for about 20min and then the new
drive cage started to fail.  Silly me for assuming new hardware would be
fine :(

The new drive cage started to fail, it hung the server and the box
rebooted.  After it rebooted, the entire pool is gone and in the state
below.  I had only written a few files to the new larger pool and I am
not concerned about restoring that data.  However, is there a way to get
back the original pool data ?
Going to http://www.sun.com/msg/ZFS-8000-3C gives a 503 error on the web
page listed BTW.


0(offsite)# zpool status
  pool: tank1
 state: UNAVAIL
status: One or more devices could not be opened.  There are insufficient
replicas for the pool to continue functioning.
action: Attach the missing device and online it using 'zpool online'.
   see: http://www.sun.com/msg/ZFS-8000-3C
 scrub: none requested
config:

NAMESTATE READ WRITE CKSUM
tank1   UNAVAIL  0 0 0  insufficient replicas
  raidz1ONLINE   0 0 0
ad0 ONLINE   0 0 0
ad1 ONLINE   0 0 0
ad4 ONLINE   0 0 0
ad6 ONLINE   0 0 0
  raidz1ONLINE   0 0 0
ada4ONLINE   0 0 0
ada5ONLINE   0 0 0
ada6ONLINE   0 0 0
ada7ONLINE   0 0 0
  raidz1UNAVAIL  0 0 0  insufficient replicas
ada0UNAVAIL  0 0 0  cannot open
ada1UNAVAIL  0 0 0  cannot open
ada2UNAVAIL  0 0 0  cannot open
ada3UNAVAIL  0 0 0  cannot open
0(offsite)#
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss