Re: [zfs-discuss] 24-port SATA controller options?

2008-04-15 Thread Brandon High
On Mon, Apr 14, 2008 at 9:41 PM, Tim [EMAIL PROTECTED] wrote:
 I'm sure you're already aware, but if not, 22 drives in a raid-6 is
 absolutely SUICIDE when using SATA disks.  12 disks is the upper end of what
 you want even with raid-6.  The odds of you losing data in a 22 disk raid-6
 is far too great to be worth it if you care about your data.  /rant

Funny, I was thinking the same thing!

I think NetApp says to use 14 disk stripes with their double parity,
arguing that double parity across 14 disks is better protection than
two single parity stripes of 7.

The other thought that I had if ZFS would have worked for him, but it
sounds like he's a Windows guy.

... and to threadjack, has there been any talk of a Windows ZFS driver?

-B

-- 
Brandon High [EMAIL PROTECTED]
The good is the enemy of the best. - Nietzsche
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] zpool create privileges

2008-04-15 Thread Marco Sommella
Hi everyone, i'm new.

I need to create a zfs filesystem in my home dir as normal user. I added in
/etc/user_attr to my user account sys_mount and sys_config privileges. I
executed:

/usr/sbin/mkfile 100M tank_file

/sbin/zpool create -R /export/home/marco/tank tank
/export/home/marco/tank_file

 

It will create the pool but can't mount.

 

Executing 2nd command with ppriv -De before I receive: missing privilege
ALL.

 

Can someone help me ?

TNX

 

  _  

Marco Sommella
 mailto:[EMAIL PROTECTED] [EMAIL PROTECTED] (E-mail  MSN) 

  _  

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How many ZFS pools is it sensible to use on a single server?

2008-04-15 Thread David Collier-Brown
  We've discussed this in considerable detail, but the original
question remains unanswered:  if an organization *must* use
multiple pools, is there an upper bound to avoid or a rate
of degradation to be considered?

--dave
-- 
David Collier-Brown| Always do right. This will gratify
Sun Microsystems, Toronto  | some people and astonish the rest
[EMAIL PROTECTED] |  -- Mark Twain
(905) 943-1983, cell: (647) 833-9377, (800) 555-9786 x56583
bridge: (877) 385-4099 code: 506 9191#
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How many ZFS pools is it sensible to use on a single server?

2008-04-15 Thread Mike Gerdts
On Tue, Apr 15, 2008 at 9:22 AM, David Collier-Brown [EMAIL PROTECTED] wrote:
   We've discussed this in considerable detail, but the original
  question remains unanswered:  if an organization *must* use
  multiple pools, is there an upper bound to avoid or a rate
  of degradation to be considered?

I have a keen interest in this as well.  I would really like zones to
be able to independently fail over between hosts in a zone farm.  The
work coming out of the Indiana, IPS, Caiman, etc. projects imply that
zones will have to be on zfs.  In order to fail zones over between
systems independently either I need to have a zpool per zone or I need
to have per-dataset replication.  Considering that with some workloads
20+ zones on a T2000 is quite feasible, a T5240 could be pushing 80+
zones and as such a relatively large number of zpools.

-- 
Mike Gerdts
http://mgerdts.blogspot.com/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Confused by compressratio

2008-04-15 Thread Jeremy F.
This may be my ignorance, but I thought all modern unix filesystems created 
sparse files in this way?


-Original Message-
From: Stuart Anderson [EMAIL PROTECTED]

Date: Mon, 14 Apr 2008 15:45:03 
To:Luke Scharf [EMAIL PROTECTED]
Cc:zfs-discuss@opensolaris.org
Subject: Re: [zfs-discuss] Confused by compressratio


On Mon, Apr 14, 2008 at 05:22:03PM -0400, Luke Scharf wrote:
 Stuart Anderson wrote:
 On Mon, Apr 14, 2008 at 09:59:48AM -0400, Luke Scharf wrote:
   
 Stuart Anderson wrote:
 
 As an artificial test, I created a filesystem with compression enabled
 and ran mkfile 1g and the reported compressratio for that filesystem
 is 1.00x even though this 1GB file only uses only 1kB.
  
   
 ZFS seems to treat files filled with zeroes as sparse files, regardless 
 of whether or not compression is enabled.  Try dd if=/dev/urandom 
 of=1g.dat bs=1024 count=1048576 to create a file that won't exhibit 
 this behavior.  Creating this file is a lot slower than writing zeroes 
 (mostly due to the speed of the urandom device), but ZFS won't treat it 
 like a sparse file, and it won't compress very well either.
 
 
 However, I am still trying to reconcile the compression ratio as
 reported by compressratio vs the ratio of file sizes to disk blocks
 used (whether or not ZFS is creating sparse files).
   
 
 Can you describe the data you're storing a bit?  Any big disk images?
 

Understanding the mkfile case would be a start, but the initial filesystem
that started my confusion is one that has a number of ~50GByte mysql database
files as well as a number of application code files.

Here is another simple test to avoid any confusion/bugs related to NULL
character sequeneces being compressed to nothing versus being treated
as sparse files. In particular, a 2GByte file full of the output of
/bin/yes:

zfs create export-cit/compress
cd /export/compress
/bin/df -k .
Filesystemkbytesused   avail capacity  Mounted on
export-cit/compress  1704858624  55 1261199742 1%/export/compress
zfs get compression export-cit/compress
NAME PROPERTY VALUESOURCE
export-cit/compress  compression  on   inherited from export-cit
/bin/yes | head -1073741824  yes.dat
/bin/ls -ls yes.dat
185017 -rw-r--r--   1 root root 2147483648 Apr 14 15:31 yes.dat
/bin/df -k .
Filesystemkbytesused   avail capacity  Mounted on
export-cit/compress  1704858624   92563 1261107232 1%/export/compress
zfs get compressratio export-cit/compress
NAME PROPERTY   VALUESOURCE
export-cit/compress  compressratio  28.39x   -

So compressratio reports 28.39, but the ratio of file size to used disk for
the only regular file on this filesystem, i.e., excluding the initial 55kB
allocated for the empty filesystem is:

2147483648 / (185017 * 512) = 22.67


Calculated another way from zfs list for the entire filesystem:

zfs list /export/compress
NAME  USED  AVAIL  REFER  MOUNTPOINT
export-cit/compress  90.4M  1.17T  90.4M  /export/compress

is 2GB/90.4M = 2048 / 90.4 = 22.65


That still leaves me puzzled what the precise definition of compressratio is?


Thanks.

---
Stuart Anderson  [EMAIL PROTECTED]
http://www.ligo.caltech.edu/~anderson
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Confused by compressratio

2008-04-15 Thread Luke Scharf
You can fill up an ext3 filesystem with the following command:
dd if=/dev/zero of=delme.dat
You can't really fill up a ZFS filesystme that way.  I guess you could, 
but I've never had the patience -- when several GB worth of zeroes takes 
1kb worth of data, then it would take a very long time.

AFAIK, ext3 supports sparse files just like it should -- but it doesn't 
dynamically figure out what to write based on the contents of the file.

-Luke

Jeremy F. wrote:
 This may be my ignorance, but I thought all modern unix filesystems created 
 sparse files in this way?


 -Original Message-
 From: Stuart Anderson [EMAIL PROTECTED]

 Date: Mon, 14 Apr 2008 15:45:03 
 To:Luke Scharf [EMAIL PROTECTED]
 Cc:zfs-discuss@opensolaris.org
 Subject: Re: [zfs-discuss] Confused by compressratio


 On Mon, Apr 14, 2008 at 05:22:03PM -0400, Luke Scharf wrote:
   
 Stuart Anderson wrote:
 
 On Mon, Apr 14, 2008 at 09:59:48AM -0400, Luke Scharf wrote:
  
   
 Stuart Anderson wrote:

 
 As an artificial test, I created a filesystem with compression enabled
 and ran mkfile 1g and the reported compressratio for that filesystem
 is 1.00x even though this 1GB file only uses only 1kB.

  
   
 ZFS seems to treat files filled with zeroes as sparse files, regardless 
 of whether or not compression is enabled.  Try dd if=/dev/urandom 
 of=1g.dat bs=1024 count=1048576 to create a file that won't exhibit 
 this behavior.  Creating this file is a lot slower than writing zeroes 
 (mostly due to the speed of the urandom device), but ZFS won't treat it 
 like a sparse file, and it won't compress very well either.

 
 However, I am still trying to reconcile the compression ratio as
 reported by compressratio vs the ratio of file sizes to disk blocks
 used (whether or not ZFS is creating sparse files).
  
   
 Can you describe the data you're storing a bit?  Any big disk images?

 

 Understanding the mkfile case would be a start, but the initial filesystem
 that started my confusion is one that has a number of ~50GByte mysql database
 files as well as a number of application code files.

 Here is another simple test to avoid any confusion/bugs related to NULL
 character sequeneces being compressed to nothing versus being treated
 as sparse files. In particular, a 2GByte file full of the output of
 /bin/yes:

   
 zfs create export-cit/compress
 cd /export/compress
 /bin/df -k .
 
 Filesystemkbytesused   avail capacity  Mounted on
 export-cit/compress  1704858624  55 1261199742 1%/export/compress
   
 zfs get compression export-cit/compress
 
 NAME PROPERTY VALUESOURCE
 export-cit/compress  compression  on   inherited from 
 export-cit
   
 /bin/yes | head -1073741824  yes.dat
 /bin/ls -ls yes.dat
 
 185017 -rw-r--r--   1 root root 2147483648 Apr 14 15:31 yes.dat
   
 /bin/df -k .
 
 Filesystemkbytesused   avail capacity  Mounted on
 export-cit/compress  1704858624   92563 1261107232 1%/export/compress
   
 zfs get compressratio export-cit/compress
 
 NAME PROPERTY   VALUESOURCE
 export-cit/compress  compressratio  28.39x   -

 So compressratio reports 28.39, but the ratio of file size to used disk for
 the only regular file on this filesystem, i.e., excluding the initial 55kB
 allocated for the empty filesystem is:

 2147483648 / (185017 * 512) = 22.67


 Calculated another way from zfs list for the entire filesystem:

   
 zfs list /export/compress
 
 NAME  USED  AVAIL  REFER  MOUNTPOINT
 export-cit/compress  90.4M  1.17T  90.4M  /export/compress

 is 2GB/90.4M = 2048 / 90.4 = 22.65


 That still leaves me puzzled what the precise definition of compressratio is?


 Thanks.

 ---
 Stuart Anderson  [EMAIL PROTECTED]
 http://www.ligo.caltech.edu/~anderson
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
   

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Confused by compressratio

2008-04-15 Thread Bob Friesenhahn
On Tue, 15 Apr 2008, Luke Scharf wrote:

 AFAIK, ext3 supports sparse files just like it should -- but it doesn't
 dynamically figure out what to write based on the contents of the file.

Since zfs inspects all data anyway in order to compute the block 
checksum, it can easily know if a block is all zeros.

For ext3, inspecting all blocks for zeros would be viewed as 
unnecessary overhead.

Bob
==
Bob Friesenhahn
[EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] 24-port SATA controller options?

2008-04-15 Thread Tim
On Tue, Apr 15, 2008 at 10:09 AM, Maurice Volaski [EMAIL PROTECTED]
wrote:

 I have 16 disks in RAID 5 and I'm not worried.

 I'm sure you're already aware, but if not, 22 drives in a raid-6 is
 absolutely SUICIDE when using SATA disks.  12 disks is the upper end of
 what
 you want even with raid-6.  The odds of you losing data in a 22 disk
 raid-6
 is far too great to be worth it if you care about your data.  /rant

 --

 Maurice Volaski, [EMAIL PROTECTED]
 Computing Support, Rose F. Kennedy Center
 Albert Einstein College of Medicine of Yeshiva University
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



You could also be driving your car down the freeway at 100mph drunk, high,
and without a seatbelt on and not be worried.  The odds will still be
horribly against you.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] 24-port SATA controller options?

2008-04-15 Thread Keith Bierman


On Apr 15, 2008, at 10:58 AM, Tim wrote:




On Tue, Apr 15, 2008 at 10:09 AM, Maurice Volaski  
[EMAIL PROTECTED] wrote:

I have 16 disks in RAID 5 and I'm not worried.

I'm sure you're already aware, but if not, 22 drives in a raid-6 is
absolutely SUICIDE when using SATA disks.  12 disks is the upper  
end of what
you want even with raid-6.  The odds of you losing data in a 22  
disk raid-6

is far too great to be worth it if you care about your data.  /rant


You could also be driving your car down the freeway at 100mph  
drunk, high, and without a seatbelt on and not be worried.  The  
odds will still be horribly against you.




Perhaps providing the computations rather than the conclusions would  
be more persuasive  on a technical list ;


--
Keith H. Bierman   [EMAIL PROTECTED]  | AIM kbiermank
5430 Nassau Circle East  |
Cherry Hills Village, CO 80113   | 303-997-2749
speaking for myself* Copyright 2008




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] 24-port SATA controller options?

2008-04-15 Thread Jacob Ritorto
Right, a nice depiction of the failure modes involved and their
probabilities based on typical published mtbf of components and other
arguments/caveats, please?  Does anyone have the cycles to actually
illustrate this or have urls to such studies?

On Tue, Apr 15, 2008 at 1:03 PM, Keith Bierman [EMAIL PROTECTED] wrote:


 On Apr 15, 2008, at 10:58 AM, Tim wrote:



 On Tue, Apr 15, 2008 at 10:09 AM, Maurice Volaski [EMAIL PROTECTED]
 wrote:
  I have 16 disks in RAID 5 and I'm not worried.
 
 
  I'm sure you're already aware, but if not, 22 drives in a raid-6 is
  absolutely SUICIDE when using SATA disks.  12 disks is the upper end of
 what
  you want even with raid-6.  The odds of you losing data in a 22 disk
 raid-6
  is far too great to be worth it if you care about your data.  /rant
 


 You could also be driving your car down the freeway at 100mph drunk, high,
 and without a seatbelt on and not be worried.  The odds will still be
 horribly against you.


 Perhaps providing the computations rather than the conclusions would be more
 persuasive  on a technical list ;

 --
 Keith H. Bierman   [EMAIL PROTECTED]  | AIM kbiermank
 5430 Nassau Circle East  |
 Cherry Hills Village, CO 80113   | 303-997-2749
 speaking for myself* Copyright 2008





 ___
  zfs-discuss mailing list
  zfs-discuss@opensolaris.org
  http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] 24-port SATA controller options?

2008-04-15 Thread Bob Friesenhahn
On Tue, 15 Apr 2008, Keith Bierman wrote:

 Perhaps providing the computations rather than the conclusions would be more 
 persuasive  on a technical list ;

No doubt.  The computations depend considerably on the size of the 
disk drives involved.  The odds of experiencing media failure on a 
single 1TB SATA disk are quite high.  Consider that this media failure 
may occur while attempting to recover from a failed disk.  There have 
been some good articles on this in USENIX Login magazine.

ZFS raidz1 and raidz2 are NOT directly equivalent to RAID5 and RAID6 
so the failure statistics would be different.  Regardless, single disk 
failure in a raidz1 substantially increases the risk that something 
won't be recoverable if there is a media failure while rebuilding. 
Since ZFS duplicates its own metadata blocks, it is most likely that 
some user data would be lost but the pool would otherwise recover.  If 
a second disk drive completely fails, then you are toast with raidz1.

RAID5 and RAID6 rebuild the entire disk while raidz1 and raidz2 only 
rebuild existing data blocks so raidz1 and raidz2 are less likely to 
experience media failure if the pool is not full.

Bob
==
Bob Friesenhahn
[EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] 24-port SATA controller options?

2008-04-15 Thread Tim
On Tue, Apr 15, 2008 at 12:03 PM, Keith Bierman [EMAIL PROTECTED] wrote:


 On Apr 15, 2008, at 10:58 AM, Tim wrote:



 On Tue, Apr 15, 2008 at 10:09 AM, Maurice Volaski [EMAIL PROTECTED]
 wrote:

  I have 16 disks in RAID 5 and I'm not worried.
 
  I'm sure you're already aware, but if not, 22 drives in a raid-6 is
  absolutely SUICIDE when using SATA disks.  12 disks is the upper end of
  what
  you want even with raid-6.  The odds of you losing data in a 22 disk
  raid-6
  is far too great to be worth it if you care about your data.  /rant
 


 You could also be driving your car down the freeway at 100mph drunk, high,
 and without a seatbelt on and not be worried.  The odds will still be
 horribly against you.


 Perhaps providing the computations rather than the conclusions would be
 more persuasive  on a technical list ;

 --
 Keith H. Bierman   [EMAIL PROTECTED]  | AIM kbiermank
 5430 Nassau Circle East  |
 Cherry Hills Village, CO 80113   | 303-997-2749
 speaking for myself* Copyright 2008






What fun is that?  ;)

http://blogs.netapp.com/dave/2006/03/expect_double_d.html

There's a layman's explanation.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] 24-port SATA controller options?

2008-04-15 Thread Maurice Volaski
Perhaps providing the computations rather than the conclusions would
be more persuasive  on a technical list ;

2 16-disk SATA arrays in RAID 5
2 16-disk SATA arrays in RAID 6
1 9-disk SATA array in RAID 5.

4 drive failures over 5 years. Of course, YMMV, especially if you 
drive drunk :-)
-- 

Maurice Volaski, [EMAIL PROTECTED]
Computing Support, Rose F. Kennedy Center
Albert Einstein College of Medicine of Yeshiva University
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] 24-port SATA controller options?

2008-04-15 Thread Bob Friesenhahn
On Tue, 15 Apr 2008, Maurice Volaski wrote:
 4 drive failures over 5 years. Of course, YMMV, especially if you
 drive drunk :-)

Note that there is a difference between drive failure and media data 
loss. In a system which has been running fine for a while, the chance 
of a second drive failing during rebuild may be low, but the chance of 
block-level media failure is not.

However, computers do not normally run in a vaccum.  Many failures are 
caused by something like a power glitch, temperature cycle, or the 
flap of a butterfly's wings.  Unless your environment is completely 
stable and the devices are not dependent on some of the same things 
(e.g. power supplies, chassis, SATA controller, air conditioning) then 
what caused one device to fail may very well cause another device to 
fail.

Bob
==
Bob Friesenhahn
[EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Confused by compressratio

2008-04-15 Thread Luke Scharf

 zfs list /export/compress
 
   
 NAME  USED  AVAIL  REFER  MOUNTPOINT
 export-cit/compress  90.4M  1.17T  90.4M  /export/compress

 is 2GB/90.4M = 2048 / 90.4 = 22.65


 That still leaves me puzzled what the precise definition of compressratio is?
 

My guess is that the compressratio doesn't include any of those runs of 
null characaters that weren't actually written to the disk.

What I'm thinking is that if you have a disk-image (of a new computer) 
in there, the 4GB worth of actual data is counted against the 
compressratio, but the 36GB worth of empty (zeroed) space isn't counted.

But I don't have hard numbers, or a good way to prove it.  Not without 
reading all of the OP's data, anyway...  :-)

-Luke

P.S.  This don't bother writing zeroes behavior is wonderful when 
working with Xen disk images.  I'm a fan!
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] 24-port SATA controller options?

2008-04-15 Thread Luke Scharf
Maurice Volaski wrote:
 Perhaps providing the computations rather than the conclusions would
 be more persuasive  on a technical list ;
 

 2 16-disk SATA arrays in RAID 5
 2 16-disk SATA arrays in RAID 6
 1 9-disk SATA array in RAID 5.

 4 drive failures over 5 years. Of course, YMMV, especially if you 
 drive drunk :-)
   

My mileage does vary!

On a 4 year old 84 disk array (with 12 RAID 5s), I replace one drive 
every couple of weeks (on average).  This array lives in a proper 
machine-room with good power and cooling.  The array stays active, though.

-Luke

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Confused by compressratio

2008-04-15 Thread Stuart Anderson
On Tue, Apr 15, 2008 at 01:37:43PM -0400, Luke Scharf wrote:
 
 zfs list /export/compress
 
   
 NAME  USED  AVAIL  REFER  MOUNTPOINT
 export-cit/compress  90.4M  1.17T  90.4M  /export/compress
 
 is 2GB/90.4M = 2048 / 90.4 = 22.65
 
 
 That still leaves me puzzled what the precise definition of compressratio 
 is?
 
 
 My guess is that the compressratio doesn't include any of those runs of 
 null characaters that weren't actually written to the disk.

This test was done with a file created with via /bin/yes | head, i.e.,
it does not have any null characters specifically for this possibility.

-- 
Stuart Anderson  [EMAIL PROTECTED]
http://www.ligo.caltech.edu/~anderson
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] 24-port SATA controller options?

2008-04-15 Thread Keith Bierman

On Apr 15, 2008, at 11:18 AM, Bob Friesenhahn wrote:
 On Tue, 15 Apr 2008, Keith Bierman wrote:

 Perhaps providing the computations rather than the conclusions  
 would be more persuasive  on a technical list ;

 No doubt.  The computations depend considerably on the size of the  
 disk drives involved.  The odds of experiencing media failure on a  
 single 1TB SATA disk are quite high.  Consider that this media  
 failure may occur while attempting to recover from a failed disk.   
 There have been some good articles on this in USENIX Login magazine.

 ZFS raidz1 and raidz2 are NOT directly equivalent to RAID5 and  
 RAID6 so the failure statistics would be different.  Regardless,  
 single disk failure in a raidz1 substantially increases the risk  
 that something won't be recoverable if there is a media failure  
 while rebuilding. Since ZFS duplicates its own metadata blocks, it  
 is most likely that some user data would be lost but the pool would  
 otherwise recover.  If a second disk drive completely fails, then  
 you are toast with raidz1.

 RAID5 and RAID6 rebuild the entire disk while raidz1 and raidz2  
 only rebuild existing data blocks so raidz1 and raidz2 are less  
 likely to experience media failure if the pool is not full.

Indeed; but worked illustrative examples are apt to be more helpful  
than blanket pronouncements ;

-- 
Keith H. Bierman   [EMAIL PROTECTED]  | AIM kbiermank
5430 Nassau Circle East  |
Cherry Hills Village, CO 80113   | 303-997-2749
speaking for myself* Copyright 2008




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] 24-port SATA controller options?

2008-04-15 Thread Jonathan Loran


Luke Scharf wrote:
 Maurice Volaski wrote:
   
 Perhaps providing the computations rather than the conclusions would
 be more persuasive  on a technical list ;
 
   
 2 16-disk SATA arrays in RAID 5
 2 16-disk SATA arrays in RAID 6
 1 9-disk SATA array in RAID 5.

 4 drive failures over 5 years. Of course, YMMV, especially if you 
 drive drunk :-)
   
 

 My mileage does vary!

 On a 4 year old 84 disk array (with 12 RAID 5s), I replace one drive 
 every couple of weeks (on average).  This array lives in a proper 
 machine-room with good power and cooling.  The array stays active, though.

 -Luke
   

I basically agree with this.  We have about 150TB in mostly RAID 5 
configurations, ranging from 8 to 16 disks per volume.  We also replace 
bad drives about every week or three, but in six years, have never lost 
an array.  I think our secret is this: on our 3ware controllers we run 
a verify at a minimum of three times a week.  The verify will read the 
whole array (data and parity), find bad blocks and move them if 
necessary to good media.  Because of this, we've never had a rebuild 
trigger a secondary failure.  knock wood.  Our server room has 
conditioned power and cooling as well.

Jon
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Will ZFS employ raid0 stripes in an ordinary storage pool?

2008-04-15 Thread Bob Friesenhahn
On Tue, 15 Apr 2008, Brandon High wrote:

 I think RAID-Z is different, since the stripe needs to spread across
 all devices for protection. I'm not sure how it's done.

My understanding is that RAID-Z is indeed different and does NOT have 
to spread across all devices for protection.  It can use less than the 
total available devices and since parity is distributed the parity 
could be written to any drive.

I am sure that someone will correct me if the above is wrong.

Bob
==
Bob Friesenhahn
[EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] 24-port SATA controller options?

2008-04-15 Thread Daniel Rock
Tim schrieb:
 I'm sure you're already aware, but if not, 22 drives in a raid-6 is 
 absolutely SUICIDE when using SATA disks.  12 disks is the upper end of 
 what you want even with raid-6.  The odds of you losing data in a 22 
 disk raid-6 is far too great to be worth it if you care about your 
 data.  /rant

Let's do some calculations. Based on the specs of

http://www.seagate.com/docs/pdf/datasheet/disc/ds_barracuda_es_2.pdf

AFR = 0.73%
BER = 1:10^15

22 disk RAID-6 with 1TB disks.

- The probability of a disk failure is 16.06% p.a.
   (0.73% * 22)
- let's assume one day array rebuild time (22 * 1TB / 300MB/s)
- This means the probability for another disk error during rebuild on
   the hot spare is is 0.042%
   (0.73% * (1/365) * 21)
- If a second disk fails there is a chance of 16% of an unrecoverable
   read error on the remaining 20 disks
   (8 * 20 * 10^12 / 10^15)

So the probability for a data loss is:

16.06% * 0.042% * 16.0% = 0.001% p.a.

(a little bit higher since I haven't calculated 3 or more failing 
disks). The calculations assume an independent failure probability of 
each disk and correct numbers for AFR and BER. In reality I found the 
AFR rates of the disk vendors way too optimistic, but the BER rate too 
pessimistic.

If we calculate with

AFR = 3%
BER = same

we end up with with a data loss probability of 0.018% p.a.



Daniel
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] 24-port SATA controller options?

2008-04-15 Thread Blake Irvin
Truly :)

I was planning something like 3 pools concatenated.  But we are only populating 
12 bays at the moment.

Blake
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Periodic flush

2008-04-15 Thread Mark Maybee
ZFS has always done a certain amount of write throttling.  In the past
(or the present, for those of you running S10 or pre build 87 bits) this
throttling was controlled by a timer and the size of the ARC: we would
cut a transaction group every 5 seconds based off of our timer, and
we would also cut a transaction group if we had more than 1/4 of the
ARC size worth of dirty data in the transaction group.  So, for example,
if you have a machine with 16GB of physical memory it wouldn't be
unusual to see an ARC size of around 12GB.  This means we would allow
up to 3GB of dirty data into a single transaction group (if the writes
complete in less than 5 seconds).  Now we can have up to three
transaction groups in progress at any time: open context, quiesce
context, and sync context.  As a final wrinkle, we also don't allow more
than 1/2 the ARC to be composed of dirty write data.  All taken
together, this means that there can be up to 6GB of writes in the pipe
(using the 12GB ARC example from above).

Problems with this design start to show up when the write-to-disk
bandwidth can't keep up with the application: if the application is
writing at a rate of, say, 1GB/sec, it will fill the pipe within
6 seconds.  But if the IO bandwidth to disk is only 512MB/sec, its
going to take 12sec to get this data onto the disk.  This impedance
mis-match is going to manifest as pauses:  the application fills
the pipe, then waits for the pipe to empty, then starts writing again.
Note that this won't be smooth, since we need to complete an entire
sync phase before allowing things to progress.  So you can end up
with IO gaps.  This is probably what the original submitter is
experiencing.  Note there are a few other subtleties here that I
have glossed over, but the general picture is accurate.

The new write throttle code put back into build 87 attempts to
smooth out the process.  We now measure the amount of time it takes
to sync each transaction group, and the amount of data in that group.
We dynamically resize our write throttle to try to keep the sync
time constant (at 5secs) under write load.  We also introduce
fairness delays on writers when we near pipeline capacity: each
write is delayed 1/100sec when we are about to fill up.  This
prevents a single heavy writer from starving out occasional
writers.  So instead of coming to an abrupt halt when the pipeline
fills, we slow down our write pace.  The result should be a constant
even IO load.

There is one down side to this new model: if a write load is very
bursty, e.g., a large 5GB write followed by 30secs of idle, the
new code may be less efficient than the old.  In the old code, all
of this IO would be let in at memory speed and then more slowly make
its way out to disk.  In the new code, the writes may be slowed down.
The data makes its way to the disk in the same amount of time, but
the application takes longer.  Conceptually: we are sizing the write
buffer to the pool bandwidth, rather than to the memory size.

Robert Milkowski wrote:
 Hello eric,
 
 Thursday, March 27, 2008, 9:36:42 PM, you wrote:
 
 ek On Mar 27, 2008, at 9:24 AM, Bob Friesenhahn wrote:
 On Thu, 27 Mar 2008, Neelakanth Nadgir wrote:
 This causes the sync to happen much faster, but as you say,  
 suboptimal.
 Haven't had the time to go through the bug report, but probably
 CR 6429205 each zpool needs to monitor its throughput
 and throttle heavy writers
 will help.
 I hope that this feature is implemented soon, and works well. :-)
 
 ek Actually, this has gone back into snv_87 (and no we don't know which  
 ek s10uX it will go into yet).
 
 
 Could you share more details how it works right now after change?
 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ? zfs import with lots of volumes ?

2008-04-15 Thread Ulrich Graef
Does anybody has a guess how it takes to 
import one zpool with lots of LUNs?

Our plan is to use ~ 400 LUNs in raidz 6+1 configuration
in one large pool.

Regards,

Ulrich

-- 
| Ulrich Graef, Senior System Engineer, OS Ambassador\
|  Operating Systems, Performance \ Platform Technology   \
|   Mail: [EMAIL PROTECTED] \ Global Systems Enginering \
|Phone: +49 6103 752 359\ Sun Microsystems Inc  \

Sitz der Gesellschaft: Sun Microsystems GmbH, Sonnenallee 1,
   D-85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Dr. Roland Boemer
Vorsitzender des Aufsichtsrates: Martin Haering

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Repairing known bad disk blocks before zfs encounters them

2008-04-15 Thread David
I have some code that implements background media scanning so I am able to 
detect bad blocks well before zfs encounters them.   I need a script or 
something that will map the known bad block(s) to a logical block so I can 
force zfs to repair the bad block from redundant/parity data.

I can't find anything that isn't part of a draconian scanning/repair mechanism. 
  Granted the zfs architecture can map physical block X to logical block Y, Z, 
and other letters of the alphabet .. but I want to go backwards. 

2nd part of the question .. assuming I know /dev/dsk/c0t0d0 has an ECC error on 
block n, and I now have the appropriate storage pool info  offset that 
corresponds to that block, then how do I force the file system to repair the 
offending block.  

This was easy to address in LINUX assuming the filesystem was built on the 
/dev/md driver, because all I had to do is force a read and twiddle with the 
parameters to force a non-cached I/O and subsequent repair.  

It seems as if zfs's is too smart for it's own good and won't let me fix 
something that I know is bad, before zfs has a chance to discover it for 
itself.   :)
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Periodic flush

2008-04-15 Thread Bob Friesenhahn
On Tue, 15 Apr 2008, Mark Maybee wrote:
 going to take 12sec to get this data onto the disk.  This impedance
 mis-match is going to manifest as pauses:  the application fills
 the pipe, then waits for the pipe to empty, then starts writing again.
 Note that this won't be smooth, since we need to complete an entire
 sync phase before allowing things to progress.  So you can end up
 with IO gaps.  This is probably what the original submitter is

Yes.  With an application which also needs to make best use of 
available CPU, these I/O gaps cut into available CPU time (by 
blocking the process) unless the application uses multithreading and 
an intermediate write queue (more memory) to separate the CPU-centric 
parts from the I/O-centric parts.  While the single-threaded 
application is waiting for data to be written, it is not able to read 
and process more data.  Since reads take time to complete, being 
blocked on write stops new reads from being started so the data is 
ready when it is needed.

 There is one down side to this new model: if a write load is very
 bursty, e.g., a large 5GB write followed by 30secs of idle, the
 new code may be less efficient than the old.  In the old code, all

This is also a common scenario. :-)

Presumably the special slow I/O code would not kick in unless the 
burst was large enough to fill quite a bit of the ARC.

Real time throttling is quite a challenge to do in software.

Bob
==
Bob Friesenhahn
[EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool create privileges

2008-04-15 Thread Darren J Moffat
Marco Sommella wrote:
 Hi everyone, i’m new.
 
 I need to create a zfs filesystem in my home dir as normal user. I added 
 in /etc/user_attr to my user account sys_mount and sys_config 
 privileges. I executed:

Instead of using RBAC for this it is much easier and much more flexible 
to use the ZFS delgated admin.

# zfs allow -u marco create,mount tank/home/marco

This then allows:

marco$ zfs create tank/home/marco/Documents

See the zfs(1) man page for more details.


-- 
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Flag day: ZFS Boot Support

2008-04-15 Thread Nicolas Williams
On Fri, Apr 11, 2008 at 06:39:00PM -0700, George Wilson wrote:
 Solaris Installation Guide
 System Administration Guide: Basic
 ZFS Administration Guide
 System Administration Guide: Devices and File Systems

Where can I get the updated guides?

 For further information, visit our OpenSolaris page at:
 
 http://opensolaris.org/os/community/zfs/boot/

There's nothing there about how to install ZFS boot systems from this
flag-day onward.

Also, the docs at http://opensolaris.org/os/community/zfs/boot/ don't
mention that builds 86 and 87 are toxic for ZFS boot systems.

How would I install a ZFS boot system with these bits?

Looking at the CRs putback I think the procedure for installing ZFS boot
systems is still much as the old procedure, but modified to: swap and
dump on a zvol, and not to use legacy mounts.  Specific details would be
useful.

Also, will ZFS boot systems built with your bits upgrade correctly once
the install and Live Upgrade phase of ZFS boot delivers?  Or should one
wait before trying ZFS boot?

Nico
-- 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] 24-port SATA controller options?

2008-04-15 Thread Richard Elling
Bob Friesenhahn wrote:
 On Tue, 15 Apr 2008, Maurice Volaski wrote:
   
 4 drive failures over 5 years. Of course, YMMV, especially if you
 drive drunk :-)
 

 Note that there is a difference between drive failure and media data 
 loss. In a system which has been running fine for a while, the chance 
 of a second drive failing during rebuild may be low, but the chance of 
 block-level media failure is not.
   

I couldn't have said it better myself :-).  The prevailing studies are
clearly showing unrecoverable reads as the most common failure
mode.

 However, computers do not normally run in a vaccum.  Many failures are 
 caused by something like a power glitch, temperature cycle, or the 
 flap of a butterfly's wings.  Unless your environment is completely 
 stable and the devices are not dependent on some of the same things 
 (e.g. power supplies, chassis, SATA controller, air conditioning) then 
 what caused one device to fail may very well cause another device to 
 fail.
   

Add to this manufacturing vintage.  We do see some vintages which
have higher incidence rates than others.  It is not often practical to get
all the disks in a system to be from different vintages, especially on a
system like the X4500.
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Will ZFS employ raid0 stripes in an ordinary storage pool?

2008-04-15 Thread Brandon High
On Tue, Apr 15, 2008 at 12:12 PM, Bob Friesenhahn
[EMAIL PROTECTED] wrote:
 On Tue, 15 Apr 2008, Brandon High wrote:
  I think RAID-Z is different, since the stripe needs to spread across
  all devices for protection. I'm not sure how it's done.

  My understanding is that RAID-Z is indeed different and does NOT have to
 spread across all devices for protection.  It can use less than the total
 available devices and since parity is distributed the parity could be
 written to any drive.

I think you're right. The parity information for a block has to be
written to a second (or third for raidz2) vdev to qualify as a full
stripe write, but this is not necessarily writing to all devices in
the zpool.

-B

-- 
Brandon High [EMAIL PROTECTED]
The good is the enemy of the best. - Nietzsche
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Confused by compressratio

2008-04-15 Thread Richard Elling
UTSL.  compressratio is the ratio of uncompressed bytes to compressed bytes.
http://cvs.opensolaris.org/source/search?q=ZFS_PROP_COMPRESSRATIOdefs=refs=path=zfshist=project=%2Fonnv

IMHO, you will (almost) never get the same number looking at bytes as you
get from counting blocks.
 -- richard

Stuart Anderson wrote:
 On Mon, Apr 14, 2008 at 05:22:03PM -0400, Luke Scharf wrote:
   
 Stuart Anderson wrote:
 
 On Mon, Apr 14, 2008 at 09:59:48AM -0400, Luke Scharf wrote:
  
   
 Stuart Anderson wrote:

 
 As an artificial test, I created a filesystem with compression enabled
 and ran mkfile 1g and the reported compressratio for that filesystem
 is 1.00x even though this 1GB file only uses only 1kB.

  
   
 ZFS seems to treat files filled with zeroes as sparse files, regardless 
 of whether or not compression is enabled.  Try dd if=/dev/urandom 
 of=1g.dat bs=1024 count=1048576 to create a file that won't exhibit 
 this behavior.  Creating this file is a lot slower than writing zeroes 
 (mostly due to the speed of the urandom device), but ZFS won't treat it 
 like a sparse file, and it won't compress very well either.

 
 However, I am still trying to reconcile the compression ratio as
 reported by compressratio vs the ratio of file sizes to disk blocks
 used (whether or not ZFS is creating sparse files).
  
   
 Can you describe the data you're storing a bit?  Any big disk images?

 

 Understanding the mkfile case would be a start, but the initial filesystem
 that started my confusion is one that has a number of ~50GByte mysql database
 files as well as a number of application code files.

 Here is another simple test to avoid any confusion/bugs related to NULL
 character sequeneces being compressed to nothing versus being treated
 as sparse files. In particular, a 2GByte file full of the output of
 /bin/yes:

   
 zfs create export-cit/compress
 cd /export/compress
 /bin/df -k .
 
 Filesystemkbytesused   avail capacity  Mounted on
 export-cit/compress  1704858624  55 1261199742 1%/export/compress
   
 zfs get compression export-cit/compress
 
 NAME PROPERTY VALUESOURCE
 export-cit/compress  compression  on   inherited from 
 export-cit
   
 /bin/yes | head -1073741824  yes.dat
 /bin/ls -ls yes.dat
 
 185017 -rw-r--r--   1 root root 2147483648 Apr 14 15:31 yes.dat
   
 /bin/df -k .
 
 Filesystemkbytesused   avail capacity  Mounted on
 export-cit/compress  1704858624   92563 1261107232 1%/export/compress
   
 zfs get compressratio export-cit/compress
 
 NAME PROPERTY   VALUESOURCE
 export-cit/compress  compressratio  28.39x   -

 So compressratio reports 28.39, but the ratio of file size to used disk for
 the only regular file on this filesystem, i.e., excluding the initial 55kB
 allocated for the empty filesystem is:

 2147483648 / (185017 * 512) = 22.67


 Calculated another way from zfs list for the entire filesystem:

   
 zfs list /export/compress
 
 NAME  USED  AVAIL  REFER  MOUNTPOINT
 export-cit/compress  90.4M  1.17T  90.4M  /export/compress

 is 2GB/90.4M = 2048 / 90.4 = 22.65


 That still leaves me puzzled what the precise definition of compressratio is?


 Thanks.

 ---
 Stuart Anderson  [EMAIL PROTECTED]
 http://www.ligo.caltech.edu/~anderson
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
   

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] First trouble, dying HDD.

2008-04-15 Thread Jorgen Lundman

Today, we foudn the x4500 NFS stopped responding for about 1 minute, but 
the server itself was idle. After a little looking around, we found this:

Apr 16 09:16:00 x4500-01.unix fmd: [ID 441519 daemon.error] SUNW-MSG-ID: 
DISK-8000-0X, TYPE: Fault, VER: 1, SEVERITY: Major
Apr 16 09:16:00 x4500-01.unix EVENT-TIME: Wed Apr 16 09:16:00 JST 2008
Apr 16 09:16:00 x4500-01.unix PLATFORM: Sun Fire X4500, CSN: 0738AMT047 
, HOSTNAME: x4500-01.unix
Apr 16 09:16:00 x4500-01.unix SOURCE: eft, REV: 1.16
Apr 16 09:16:00 x4500-01.unix EVENT-ID: 949038eb-f474-421b-bb79-981810c48f25
Apr 16 09:16:00 x4500-01.unix DESC: SMART health-monitoring firmware 
reported that a disk
Apr 16 09:16:00 x4500-01.unix failure is imminent.
Apr 16 09:16:00 x4500-01.unix   Refer to http://sun.com/msg/DISK-8000-0X 
for more information.
Apr 16 09:16:00 x4500-01.unix AUTO-RESPONSE: None.
Apr 16 09:16:00 x4500-01.unix IMPACT: It is likely that the continued 
operation of
Apr 16 09:16:00 x4500-01.unix this disk will result in data loss.
Apr 16 09:16:00 x4500-01.unix REC-ACTION: Schedule a repair procedure to 
replace the affected disk.
Apr 16 09:16:00 x4500-01.unix Use fmdump -v -u EVENT_ID to identify 
the disk.


# fmdump -v -u 949038eb-f474-421b-bb79-981810c48f25

TIME UUID SUNW-MSG-ID
Apr 16 09:16:00.6628 949038eb-f474-421b-bb79-981810c48f25 DISK-8000-0X
   100%  fault.io.disk.predictive-failure

 Problem in: 
hc://:product-id=Sun-Fire-X4500:chassis-id=0738AMT047:server-id=x4500-01.unix:serial=KRVN67ZBHDPX3H:part=HITACHI-HDS7250SASUN500G-0726KDPX3H:revision=K2AOAJ0A/bay=9/disk=0
Affects: 
dev:///:devid=id1,[EMAIL PROTECTED]//[EMAIL PROTECTED],0/pci1022,[EMAIL 
PROTECTED]/pci11ab,[EMAIL PROTECTED]/[EMAIL PROTECTED],0
FRU: 
hc://:product-id=Sun-Fire-X4500:chassis-id=0738AMT047:server-id=x4500-01.unix:serial=KRVN67ZBHDPX3H:part=HITACHI-HDS7250SASUN500G-0726KDPX3H:revision=K2AOAJ0A/bay=9/disk=0
   Location: HD_ID_9




The server is back and operating again, so we are happy with how it 
handled it. It would have been nice if the URL above would have further 
URLs leading to how to remove a HDD from x4500/other platforms.

Can I tell zfs to just not use the HDD for now, until we can pull it out 
and replace it? We have 14TB spare, so no risk of running out of space 
right now.

Lund




-- 
Jorgen Lundman   | [EMAIL PROTECTED]
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo| +81 (0)90-5578-8500  (cell)
Japan| +81 (0)3 -3375-1767  (home)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] First trouble, dying HDD.

2008-04-15 Thread Jorgen Lundman

Although, I'm having some issues finding the exact process to go from
fmdump's HD_ID_9  to zpool offline c0t?d? style input.

I can not run zpool status, nor format command as they hang. All the 
Sun documentation already assume you know the c?t?d? disk name. Today, 
it is easier to just pull out the HDD since the x4500 doesn't want to 
mark it dead.



Jorgen Lundman wrote:
 Today, we foudn the x4500 NFS stopped responding for about 1 minute, but 
 the server itself was idle. After a little looking around, we found this:
 
 Apr 16 09:16:00 x4500-01.unix fmd: [ID 441519 daemon.error] SUNW-MSG-ID: 
 DISK-8000-0X, TYPE: Fault, VER: 1, SEVERITY: Major
 Apr 16 09:16:00 x4500-01.unix EVENT-TIME: Wed Apr 16 09:16:00 JST 2008
 Apr 16 09:16:00 x4500-01.unix PLATFORM: Sun Fire X4500, CSN: 0738AMT047 
 , HOSTNAME: x4500-01.unix
 Apr 16 09:16:00 x4500-01.unix SOURCE: eft, REV: 1.16
 Apr 16 09:16:00 x4500-01.unix EVENT-ID: 949038eb-f474-421b-bb79-981810c48f25
 Apr 16 09:16:00 x4500-01.unix DESC: SMART health-monitoring firmware 
 reported that a disk
 Apr 16 09:16:00 x4500-01.unix failure is imminent.
 Apr 16 09:16:00 x4500-01.unix   Refer to http://sun.com/msg/DISK-8000-0X 
 for more information.
 Apr 16 09:16:00 x4500-01.unix AUTO-RESPONSE: None.
 Apr 16 09:16:00 x4500-01.unix IMPACT: It is likely that the continued 
 operation of
 Apr 16 09:16:00 x4500-01.unix this disk will result in data loss.
 Apr 16 09:16:00 x4500-01.unix REC-ACTION: Schedule a repair procedure to 
 replace the affected disk.
 Apr 16 09:16:00 x4500-01.unix Use fmdump -v -u EVENT_ID to identify 
 the disk.
 
 
 # fmdump -v -u 949038eb-f474-421b-bb79-981810c48f25
 
 TIME UUID SUNW-MSG-ID
 Apr 16 09:16:00.6628 949038eb-f474-421b-bb79-981810c48f25 DISK-8000-0X
100%  fault.io.disk.predictive-failure
 
  Problem in: 
 hc://:product-id=Sun-Fire-X4500:chassis-id=0738AMT047:server-id=x4500-01.unix:serial=KRVN67ZBHDPX3H:part=HITACHI-HDS7250SASUN500G-0726KDPX3H:revision=K2AOAJ0A/bay=9/disk=0
 Affects: 
 dev:///:devid=id1,[EMAIL PROTECTED]//[EMAIL PROTECTED],0/pci1022,[EMAIL 
 PROTECTED]/pci11ab,[EMAIL PROTECTED]/[EMAIL PROTECTED],0
 FRU: 
 hc://:product-id=Sun-Fire-X4500:chassis-id=0738AMT047:server-id=x4500-01.unix:serial=KRVN67ZBHDPX3H:part=HITACHI-HDS7250SASUN500G-0726KDPX3H:revision=K2AOAJ0A/bay=9/disk=0
Location: HD_ID_9
 
 
 
 
 The server is back and operating again, so we are happy with how it 
 handled it. It would have been nice if the URL above would have further 
 URLs leading to how to remove a HDD from x4500/other platforms.
 
 Can I tell zfs to just not use the HDD for now, until we can pull it out 
 and replace it? We have 14TB spare, so no risk of running out of space 
 right now.
 
 Lund
 
 
 
 

-- 
Jorgen Lundman   | [EMAIL PROTECTED]
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo| +81 (0)90-5578-8500  (cell)
Japan| +81 (0)3 -3375-1767  (home)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] First trouble, dying HDD.

2008-04-15 Thread James C. McPherson
Hi Jorgen,

Jorgen Lundman wrote:
 Although, I'm having some issues finding the exact process to go from
 fmdump's HD_ID_9  to zpool offline c0t?d? style input.
 
 I can not run zpool status, nor format command as they hang. All the 
 Sun documentation already assume you know the c?t?d? disk name. Today, 
 it is easier to just pull out the HDD since the x4500 doesn't want to 
 mark it dead.
 
...

 # fmdump -v -u 949038eb-f474-421b-bb79-981810c48f25

 TIME UUID SUNW-MSG-ID
 Apr 16 09:16:00.6628 949038eb-f474-421b-bb79-981810c48f25 DISK-8000-0X
100%  fault.io.disk.predictive-failure

  Problem in: 
 hc://:product-id=Sun-Fire-X4500:chassis-id=0738AMT047:server-id=x4500-01.unix:serial=KRVN67ZBHDPX3H:part=HITACHI-HDS7250SASUN500G-0726KDPX3H:revision=K2AOAJ0A/bay=9/disk=0
 Affects: 
 dev:///:devid=id1,[EMAIL PROTECTED]//[EMAIL PROTECTED],0/pci1022,[EMAIL 
 PROTECTED]/pci11ab,[EMAIL PROTECTED]/[EMAIL PROTECTED],0
 FRU: 
 hc://:product-id=Sun-Fire-X4500:chassis-id=0738AMT047:server-id=x4500-01.unix:serial=KRVN67ZBHDPX3H:part=HITACHI-HDS7250SASUN500G-0726KDPX3H:revision=K2AOAJ0A/bay=9/disk=0
Location: HD_ID_9

You get three extra pieces of information in the fmdump output:
the device path, the devid and from that the device manufacturer
model number and serial number.

The device path is
/[EMAIL PROTECTED],0/pci1022,[EMAIL PROTECTED]/pci11ab,[EMAIL 
PROTECTED]/[EMAIL PROTECTED],0
which you can grep for in /etc/path_to_inst and then map using
the output from iostat -En.

The devid is the unique device identifier and this shows up
in the output from prtpicl -v and prtconf -v. Both of these
utilities should also then show you the devfs-path property
which you should be able to use to map to a cXtYdZ number.

Finally, you can see that you've got a Hitachi HDS7250S
with serial number KRVN67ZBHDPX3H - this will definitely be
reported in your iostat -En output.


cheers,
James C. McPherson
--
Solaris kernel software engineer, system admin and troubleshooter
   http://www.jmcp.homeunix.com/blog
   http://blogs.sun.com/jmcp
Find me on LinkedIn @ http://www.linkedin.com/in/jamescmcpherson

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss