Re: [zfs-discuss] RAID Failure Calculator (for 8x 2TB RAIDZ)

2011-02-06 Thread Richard Elling
On Feb 6, 2011, at 6:45 PM, Matthew Angelo wrote:

> I require a new high capacity 8 disk zpool.  The disks I will be
> purchasing (Samsung or Hitachi) have an Error Rate (non-recoverable,
> bits read) of 1 in 10^14 and will be 2TB.  I'm staying clear of WD
> because they have the new 2048b sectors which don't play nice with ZFS
> at the moment.
> 
> My question is, how do I determine which of the following zpool and
> vdev configuration I should run to maximize space whilst mitigating
> rebuild failure risk?

The MTTDL[2] model will work.
http://blogs.sun.com/relling/entry/a_story_of_two_mttdl
As described, this model doesn't scale well for N > 3 or 4, but it will get
you in the ballpark.

You will also need to know the MTBF from the data sheet, but if you
don't have that info, that is ok because you are asking the right question:
given a single drive type, what is the best configuration for preventing
data loss. Finally, to calculate the raidz2 result, you need to know the 
mean time to recovery (MTTR) which includes the logistical replacement
time and resilver time.

Basically, the model calculates the probability of a data loss event during
reconstruction. This is different for ZFS and most other LVMs because ZFS
will only resilver data and the total data <= disk size.

> 
> 1. 2x RAIDZ(3+1) vdev
> 2. 1x RAIDZ(7+1) vdev
> 3. 1x RAIDZ2(7+1) vdev
> 
> 
> I just want to prove I shouldn't run a plain old RAID5 (RAIDZ) with 8x
> 2TB disks.

Double parity will win over single parity. Intuitively, when you add parity you
multiply by the MTBF. When you add disks to a set, you change the denominator
by a few digits. Obviously multiplication is a good thing, dividing not so much.
In short, raidz2 is the better choice.
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] RAID Failure Calculator (for 8x 2TB RAIDZ)

2011-02-06 Thread Matthew Angelo
Yes I did mean 6+2, Thank you for fixing the typo.

I'm actually more leaning towards running a simple 7+1 RAIDZ1.
Running this with 1TB is not a problem but I just wanted to
investigate at what TB size the "scales would tip".   I understand
RAIDZ2 protects against failures during a rebuild process.  Currently,
my RAIDZ1 takes 24 hours to rebuild a failed disk, so with 2TB disks
and worse case assuming this is 2 days this is my 'exposure' time.

For example, I would hazard a confident guess that 7+1 RAIDZ1 with 6TB
drives wouldn't be a smart idea.  I'm just trying to extrapolate down.

I will be running hot (or maybe cold) spare.  So I don't need to
factor in "Time it takes for a manufacture to replace the drive".



On Mon, Feb 7, 2011 at 2:48 PM, Edward Ned Harvey
 wrote:
>> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
>> boun...@opensolaris.org] On Behalf Of Matthew Angelo
>>
>> My question is, how do I determine which of the following zpool and
>> vdev configuration I should run to maximize space whilst mitigating
>> rebuild failure risk?
>>
>> 1. 2x RAIDZ(3+1) vdev
>> 2. 1x RAIDZ(7+1) vdev
>> 3. 1x RAIDZ2(6+2) vdev
>>
>> I just want to prove I shouldn't run a plain old RAID5 (RAIDZ) with 8x
>> 2TB disks.
>
> (Corrected type-o, 6+2 for you).
> Sounds like you made up your mind already.  Nothing wrong with that.  You
> are apparently uncomfortable running with only 1 disk worth of redundancy.
> There is nothing fundamentally wrong with the raidz1 configuration, but the
> probability of failure is obviously higher.
>
> Question is how do you calculate the probability?  Because if we're talking
> abou 5e-21 versus 3e-19 then you probably don't care about the difference...
> They're both essentially zero probability...  Well...  There's no good
> answer to that.
>
> With the cited probability of bit error rate, you're just representing the
> probability of a bit error.  You're not representing the probability of a
> failed drive.  And you're not representing the probability of a drive
> failure within a specified time window.  What you really care about is the
> probability of two drives (or 3 drives) failing concurrently...  In which
> case, you need to model the probability of any one drive failing within a
> specified time window.  And even if you want to model that probability, in
> reality it's not linear.  The probability of a drive failing between 1yr and
> 1yr+3hrs is smaller than the probability of the drive failing between 3yr
> and 3yr+3hrs.  Because after 3yrs, the failure rate will be higher.  So
> after 3 yrs, the probability of multiple simultaneous failures is higher.
>
> I recently saw some seagate data sheets which specified the annual disk
> failure rate to be 0.3%.  Again, this is a linear model, representing a
> nonlinear reality.
>
> Suppose one disk fails...  How many weeks does it take to get a replacement
> onsite under the 3yr limited mail-in warranty?
>
> But then again after 3 years, you're probably considering this your antique
> hardware, and all the stuff you care about is on a newer server.  Etc.
>
> There's no good answer to your question.
>
> You are obviously uncomfortable with a single disk worth of redundancy.  Go
> with your gut.  Sleep well at night.  It only costs you $100.  You probably
> have a cell phone with no backups worth more than that in your pocket right
> now.
>
>
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] kernel messages question

2011-02-06 Thread Richard Elling
On Feb 5, 2011, at 2:44 PM, Roy Sigurd Karlsbakk wrote:

> Hi
> 
> I keep getting these messages on this one box. There are issues with at least 
> one of the drives in it, but since there are some 80 drives in it, that's not 
> really an issue. I just want to know, if anyone knows, what this kernel 
> message mean. Anyone?
> 
> Feb  5 19:35:57 prv-backup scsi: [ID 365881 kern.info] 
> /pci@7a,0/pci8086,340e@7/pci1000,3140@0 (mpt1):
> Feb  5 19:35:57 prv-backup  Log info 0x3108 received for target 13.

The SAS device at target 13 reported something...
Note: target 13 has nothing to do with the OS enumeration. One method of
finding which disk is assigned to target 13 is to use LSIutil, option #8 which
shows the SAS address of the device assigned to target 13.

> Feb  5 19:35:57 prv-backup  scsi_status=0x0, ioc_status=0x804b, 
> scsi_state=0x0

The command was terminated.  In many cases, this is because a reset was
sent. fmdump -e[V] provides better information on what has transpired.

Note: the device that received a reset might not be the device which is
creating the need for the reset.
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Understanding directio, O_DSYNC and zfs_nocacheflush on ZFS

2011-02-06 Thread Richard Elling
On Feb 5, 2011, at 8:10 AM, Yi Zhang wrote:

> Hi all,
> 
> I'm trying to achieve the same effect of UFS directio on ZFS and here
> is what I did:

Solaris UFS directio has three functions:
1. improved async code path
2. multiple concurrent writers
3. no buffering

Of the three, #1 and #2 were designed into ZFS from day 1, so there is nothing
to set or change to take advantage of the feature.

> 
> 1. Set the primarycache of zfs to metadata and secondarycache to none,
> recordsize to 8K (to match the unit size of writes)
> 2. Run my test program (code below) with different options and measure
> the running time.
> a) open the file without O_DSYNC flag: 0.11s.
> This doesn't seem like directio is in effect, because I tried on UFS
> and time was 2s. So I went on with more experiments with the O_DSYNC
> flag set. I know that directio and O_DSYNC are two different things,
> but I thought the flag would force synchronous writes and achieve what
> directio does (and more).

Directio and O_DSYNC are two different features.

> b) open the file with O_DSYNC flag: 147.26s

ouch

> c) same as b) but also enabled zfs_nocacheflush: 5.87s

Is your pool created from a single HDD?

> My questions are:
> 1. With my primarycache and secondarycache settings, the FS shouldn't
> buffer reads and writes anymore. Wouldn't that be equivalent to
> O_DSYNC? Why a) and b) are so different?

No. O_DSYNC deals with when the I/O is committed to media.

> 2. My understanding is that zfs_nocacheflush essentially removes the
> sync command sent to the device, which cancels the O_DSYNC flag. Why
> b) and c) are so different?

No. Disabling the cache flush means that the volatile write buffer in the 
disk is not flushed. In other words, disabling the cache flush is in direct
conflict with the semantics of O_DSYNC.

> 3. Does ZIL have anything to do with these results?

Yes. The ZIL is used for meeting the O_DSYNC requirements.  This has
nothing to do with buffering. More details are on the ZFS Best Practices Guide.
 -- richard

> 
> Thanks in advance for any suggestion/insight!
> Yi
> 
> 
> #include 
> #include 
> 
> int main(int argc, char **argv)
> {
>   struct timeval tim;
>   gettimeofday(&tim, NULL);
>   double t1 = tim.tv_sec + tim.tv_usec/100.0;
>   char a[8192];
>   int fd = open(argv[1], O_RDWR|O_CREAT|O_TRUNC, 0660);
>   //int fd = open(argv[1], O_RDWR|O_CREAT|O_TRUNC|O_DSYNC, 0660);
>   if (argv[2][0] == '1')
>   directio(fd, DIRECTIO_ON);
>   int i;
>   for (i=0; i<1; ++i)
>   pwrite(fd, a, sizeof(a), i*8192);
>   close(fd);
>   gettimeofday(&tim, NULL);
>   double t2 = tim.tv_sec + tim.tv_usec/100.0;
>   printf("%f\n", t2-t1);
> }
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] RAID Failure Calculator (for 8x 2TB RAIDZ)

2011-02-06 Thread Edward Ned Harvey
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> boun...@opensolaris.org] On Behalf Of Matthew Angelo
> 
> My question is, how do I determine which of the following zpool and
> vdev configuration I should run to maximize space whilst mitigating
> rebuild failure risk?
> 
> 1. 2x RAIDZ(3+1) vdev
> 2. 1x RAIDZ(7+1) vdev
> 3. 1x RAIDZ2(6+2) vdev
> 
> I just want to prove I shouldn't run a plain old RAID5 (RAIDZ) with 8x
> 2TB disks.

(Corrected type-o, 6+2 for you).
Sounds like you made up your mind already.  Nothing wrong with that.  You
are apparently uncomfortable running with only 1 disk worth of redundancy.
There is nothing fundamentally wrong with the raidz1 configuration, but the
probability of failure is obviously higher.

Question is how do you calculate the probability?  Because if we're talking
abou 5e-21 versus 3e-19 then you probably don't care about the difference...
They're both essentially zero probability...  Well...  There's no good
answer to that.  

With the cited probability of bit error rate, you're just representing the
probability of a bit error.  You're not representing the probability of a
failed drive.  And you're not representing the probability of a drive
failure within a specified time window.  What you really care about is the
probability of two drives (or 3 drives) failing concurrently...  In which
case, you need to model the probability of any one drive failing within a
specified time window.  And even if you want to model that probability, in
reality it's not linear.  The probability of a drive failing between 1yr and
1yr+3hrs is smaller than the probability of the drive failing between 3yr
and 3yr+3hrs.  Because after 3yrs, the failure rate will be higher.  So
after 3 yrs, the probability of multiple simultaneous failures is higher. 

I recently saw some seagate data sheets which specified the annual disk
failure rate to be 0.3%.  Again, this is a linear model, representing a
nonlinear reality.

Suppose one disk fails...  How many weeks does it take to get a replacement
onsite under the 3yr limited mail-in warranty?

But then again after 3 years, you're probably considering this your antique
hardware, and all the stuff you care about is on a newer server.  Etc.

There's no good answer to your question.  

You are obviously uncomfortable with a single disk worth of redundancy.  Go
with your gut.  Sleep well at night.  It only costs you $100.  You probably
have a cell phone with no backups worth more than that in your pocket right
now.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] RAID Failure Calculator (for 8x 2TB RAIDZ)

2011-02-06 Thread Ian Collins

 On 02/ 7/11 03:45 PM, Matthew Angelo wrote:

I require a new high capacity 8 disk zpool.  The disks I will be
purchasing (Samsung or Hitachi) have an Error Rate (non-recoverable,
bits read) of 1 in 10^14 and will be 2TB.  I'm staying clear of WD
because they have the new 2048b sectors which don't play nice with ZFS
at the moment.

My question is, how do I determine which of the following zpool and
vdev configuration I should run to maximize space whilst mitigating
rebuild failure risk?

1. 2x RAIDZ(3+1) vdev
2. 1x RAIDZ(7+1) vdev
3. 1x RAIDZ2(7+1) vdev


I assume 3 was 6+2.

A bigger issue than drive error rates is how long a new 2TB drive will 
take to resilver if one dies.  How long are you willing to run without 
redundancy in your pool?


--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Repairing Faulted ZFS pool when zbd doesn't recognize the pool as existing

2011-02-06 Thread George Wilson
Chris,

I might be able to help you recover the pool but will need access to your
system. If you think this is possible just ping me off list and let me know.

Thanks,
George


On Sun, Feb 6, 2011 at 4:56 PM, Chris Forgeron  wrote:

> Hello all,
>
>  Long time reader, first time poster.
>
>
>
> I’m on day two of a rather long struggle with ZFS and my data. It seems we
> have a difference of opinion – ZFS doesn’t think I have any, and I’m pretty
> sure I saw a crapload of it just the other day.
>
>
>
> I’ve been researching and following various bits of information that I’ve
> found from so many helpful people on this list, but I’m running into a
> slightly different problem than the rest of you;
>
>
>
> My zdb doesn’t seem to recognize the pool for any command other than zdb –
> e 
>
>
>
> I think my problem is a corrupt set of uberblocks, and if I could go back
> in time a bit, everything would be rosy. But how do you do that when zdb
> doesn’t give you the output that you need?
>
>
>
>
>
> Let’s start at the beginning, as this will be a rather long post. Hopefully
> it will be of use to others in similar situations.
>
>
>
> I was running Solaris Express 11, keeping my pool at v28 so I could
> occasionally switch back into FreeBSD-9-Current for tests, comparisons, etc.
>
>
>
> I’ve built a rather large raidz comprised of 25 1.5 TB drives, organized
> into a striped 5 x 5 drive raidz.
>
>
>
> Friday night, one of the 1.5 TB’s faulted, and the reslivering process
> started to the spare 1.5 TB drive. All was normal.
>
>
>
> In the morning, the resliver was around 86% complete when I started working
> on the CIFS ability of Solaris – I wanted to take it’s authentication from
> Workgroup to Domain mode, and thus I was following procedure on this,
> setting up krb5.conf, etc. I also changed the hostname at this point to
> better label the system.
>
>
>
> I had rebooted once during this, and everything came back up fine. The
> drive was still reslivering. I then went for a second reboot, and when the
> system came back up, I was shocked to see my pool was in a faulted state.
>
>
>
> Here’s a zpool status output from that fateful moment:
>
>
>
> -=-=-=-=-
>
>   pool: tank
>
> state: FAULTED
>
> status: The pool metadata is corrupted and the pool cannot be opened.
>
> action: Destroy and re-create the pool from
>
> a backup source.
>
>see: http://www.sun.com/msg/ZFS-8000-72
>
> scan: none requested
>
> config:
>
>
>
> NAME STATE READ WRITE CKSUM
>
> tank FAULTED  0 0 1  corrupted data
>
>   raidz1-0   ONLINE   0 0 2
>
> c9t0d0   ONLINE   0 0 0
>
> c9t0d1   ONLINE   0 0 0
>
> c9t0d2   ONLINE   0 0 0
>
> c9t0d3   ONLINE   0 0 0
>
> c9t0d4   ONLINE   0 0 0
>
>   raidz1-1   ONLINE   0 0 0
>
> c9t1d0   ONLINE   0 0 0
>
> c9t1d1   ONLINE   0 0 0
>
> c9t1d2   ONLINE   0 0 0
>
> c9t1d3   ONLINE   0 0 0
>
> c9t1d4   ONLINE   0 0 0
>
>   raidz1-2   ONLINE   0 0 0
>
> c9t2d0   ONLINE   0 0 0
>
> c9t2d1   ONLINE   0 0 0
>
> c9t2d2   ONLINE   0 0 0
>
> c9t2d3   ONLINE   0 0 0
>
> c9t2d4   ONLINE   0 0 0
>
>   raidz1-3   ONLINE   0 0 2
>
> c9t3d0   ONLINE   0 0 0
>
> c9t3d1   ONLINE   0 0 0
>
> c9t3d2   ONLINE   0 0 0
>
> c9t3d3   ONLINE   0 0 0
>
> c9t3d4   ONLINE   0 0 0
>
>   raidz1-6   ONLINE   0 0 2
>
> c9t4d0   ONLINE   0 0 0
>
> c9t4d1   ONLINE   0 0 0
>
> c9t4d2   ONLINE   0 0 0
>
> c9t4d3   ONLINE   0 0 0
>
> replacing-4  ONLINE   0 0 0
>
>   c9t4d4 ONLINE   0 0 0
>
>   c9t15d1ONLINE   0 0 0
>
> logs
>
>   c9t14d0p0  ONLINE   0 0 0
>
>   c9t14d1p0  ONLINE   0 0 0
>
>
>
> -=-=-=-=-=-
>
>
>
> After a “holy crap”, and a check of /tank to see if it really was gone, I
> executed a zpool export and then a zpool import.
>
>
>
> (notice the 2 under the raidz1-3 vdev, as well as the raidz1-6)
>
>
>
> Export worked fine, couldn’t import, as I received an I/O error.
>
>
>
> At this stage, I thought it was something stupid with the resliver being
> jammed, and since I had 4 out of 5 drives functional in my raidz1-6 vdev, I
> figured I’d just remove those

[zfs-discuss] RAID Failure Calculator (for 8x 2TB RAIDZ)

2011-02-06 Thread Matthew Angelo
I require a new high capacity 8 disk zpool.  The disks I will be
purchasing (Samsung or Hitachi) have an Error Rate (non-recoverable,
bits read) of 1 in 10^14 and will be 2TB.  I'm staying clear of WD
because they have the new 2048b sectors which don't play nice with ZFS
at the moment.

My question is, how do I determine which of the following zpool and
vdev configuration I should run to maximize space whilst mitigating
rebuild failure risk?

1. 2x RAIDZ(3+1) vdev
2. 1x RAIDZ(7+1) vdev
3. 1x RAIDZ2(7+1) vdev


I just want to prove I shouldn't run a plain old RAID5 (RAIDZ) with 8x
2TB disks.

Cheers
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] kernel messages question

2011-02-06 Thread Krunal Desai
On Sat, Feb 5, 2011 at 5:44 PM, Roy Sigurd Karlsbakk  wrote:
> Hi
>
> I keep getting these messages on this one box. There are issues with at least 
> one of the drives in it, but since there are some 80 drives in it, that's not 
> really an issue. I just want to know, if anyone knows, what this kernel 
> message mean. Anyone?
>
> Feb  5 19:35:57 prv-backup scsi: [ID 365881 kern.info] 
> /pci@7a,0/pci8086,340e@7/pci1000,3140@0 (mpt1):
> Feb  5 19:35:57 prv-backup      Log info 0x3108 received for target 13.
> Feb  5 19:35:57 prv-backup      scsi_status=0x0, ioc_status=0x804b, 
> scsi_state=0x0

I think I got those when I had a loose cable on the backplane (aka,
physical medium errors).
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Identifying drives (SATA)

2011-02-06 Thread Orvar Korvar
Roy, I read your question on OpenIndiana mail lists: how can you rebalance your 
huge raid, without implementing block pointer rewrite? You have an old vdev 
full of data, and now you have added a new vdev - and you want the data to be 
evenly spread out to all vdevs.

I answer here beceause it is easier to me, than mail openindina.

I think it should work to create a new zfs filesystem which will reside on all 
vdevs, and then move your old data to the new fileystem. Then all data will be 
evenly spread out.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Identifying drives (SATA)

2011-02-06 Thread Orvar Korvar
Heh. My bad. Didnt read the command. Yes, that should be safe.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs-discuss Digest, Vol 64, Issue 13

2011-02-06 Thread Michael Armstrong
Additionally, the way I do it is to draw a diagram of the drives in the system, 
labelled with the drive serial numbers. Then when a drive fails, I can find out 
from smartctl which drive it is and remove/replace without trial and error.

On 5 Feb 2011, at 21:54, zfs-discuss-requ...@opensolaris.org wrote:

> 
> Message: 7
> Date: Sat, 5 Feb 2011 15:42:45 -0500
> From: rwali...@washdcmail.com
> To: David Dyer-Bennet 
> Cc: zfs-discuss@opensolaris.org
> Subject: Re: [zfs-discuss] Identifying drives (SATA)
> Message-ID: <58b53790-323b-4ae4-98cd-575f93b66...@washdcmail.com>
> Content-Type: text/plain; charset=us-ascii
> 
> 
> On Feb 5, 2011, at 2:43 PM, David Dyer-Bennet wrote:
> 
>> Is there a clever way to figure out which drive is which?  And if I have to 
>> fall back on removing a drive I think is right, and seeing if that's true, 
>> what admin actions will I have to perform to get the pool back to safety?  
>> (I've got backups, but it's a pain to restore of course.) (Hmmm; in 
>> single-user mode, use dd to read huge chunks of one disk, and see which 
>> lights come on?  Do I even need to be in single-user mode to do that?)
> 
> Obviously this depends on your lights working to some extent (the right light 
> doing something when the right disk is accessed), but I've used:
> 
> dd if=/dev/rdsk/c8t3d0s0 of=/dev/null bs=4k count=10
> 
> which someone mentioned on this list.  Assuming you can actually read from 
> the disk (it isn't completely dead), it should allow you to direct traffic to 
> each drive individually.
> 
> Good luck,
> Ware

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS and TRIM - No need for TRIM

2011-02-06 Thread Erik Trimble

On 2/6/2011 3:51 AM, Orvar Korvar wrote:

Ok, so can we say that the conclusion for a home user is:

1) Using SSD without TRIM is acceptable. The only drawback is that without 
TRIM, the SSD will write much more, which effects life time. Because when the 
SSD has written enough, it will break.

I dont have high demands for my OS disk, so battery backup is overkill for my 
needs.

So I can happily settle for the next gen Intel G3 SSD disk, without worrying 
the SSD will break because Solaris has no TRIM support yet?


Yes. All modern SSDs will wear out, but, even without TRIM support, it 
will be a significant time (5+ years) before they do.  Internal 
wear-leveling by the SSD controller results in an expected lifespan 
about the same as hard drives.


TRIM really only impacts performance. For the ZFS ZIL use case, TRIM has 
only a small impact on performance - SSD performance for ZIL drops off 
quickly from peak, and supporting TRIM would only slightly mitigate this.


For home use, lack of TRIM support won't noticeably change your 
performance as a ZIL cache or lower the lifespan of the SSD.



The Intel X25-M (either G3 or G2) would be sufficient for your purposes.

In general, we do strongly recommend you put a UPS on your system, to 
avoid cache corruption in case of power outages.




2) And later, when Solaris gets TRIM support, should I reformat or is there no 
need to reformat? I mean, maybe I must format and reinstall to get TRIM all 
over the disk. Or will TRIM immediately start to do it's magic?


If/when TRIM is supported by ZFS, I would expect this to transparent to 
you, the end-user. You'd have to upgrade the OS to the proper new 
patchlevel, and *possibly* run a 'zpool upgrade' to update the various 
pools to the latest version, but I suspect the latter will be completely 
unnecessary. TRIM support would come in ZFS's guts, not in the pool 
format.  Worst case is that you'd have to enable TRIM at the device 
layer, which would probably entail either editing a config file and 
rebooting, or just running some command to enable the feature.   I can't 
imagine it would require any reformating or reinstalling.



--
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS and TRIM - No need for TRIM

2011-02-06 Thread Erik Trimble

On 2/6/2011 3:51 AM, Orvar Korvar wrote:

Ok, so can we say that the conclusion for a home user is:

1) Using SSD without TRIM is acceptable. The only drawback is that without 
TRIM, the SSD will write much more, which effects life time. Because when the 
SSD has written enough, it will break.

I dont have high demands for my OS disk, so battery backup is overkill for my 
needs.

So I can happily settle for the next gen Intel G3 SSD disk, without worrying 
the SSD will break because Solaris has no TRIM support yet?


Yes. All modern SSDs will wear out, but, even without TRIM support, it 
will be a significant time (5+ years) before they do.  Internal 
wear-leveling by the SSD controller results in an expected lifespan 
about the same as hard drives.


TRIM really only impacts performance. For the ZFS ZIL use case, TRIM has 
only a small impact on performance - SSD performance for ZIL drops off 
quickly from peak, and supporting TRIM would only slightly mitigate this.


For home use, lack of TRIM support won't noticeably change your 
performance as a ZIL cache or lower the lifespan of the SSD.



The Intel X25-M (either G3 or G2) would be sufficient for your purposes.

In general, we do strongly recommend you put a UPS on your system, to 
avoid cache corruption in case of power outages.




2) And later, when Solaris gets TRIM support, should I reformat or is there no 
need to reformat? I mean, maybe I must format and reinstall to get TRIM all 
over the disk. Or will TRIM immediately start to do it's magic?


If/when TRIM is supported by ZFS, I would expect this to transparent to 
you, the end-user. You'd have to upgrade the OS to the proper new 
patchlevel, and *possibly* run a 'zpool upgrade' to update the various 
pools to the latest version, but I suspect the latter will be completely 
unnecessary. TRIM support would come in ZFS's guts, not in the pool 
format.  Worst case is that you'd have to enable TRIM at the device 
layer, which would probably entail either editing a config file and 
rebooting, or just running some command to enable the feature.   I can't 
imagine it would require any reformating or reinstalling.



--
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Identifying drives (SATA), question about hot spare allocation

2011-02-06 Thread David Dyer-Bennet

Following up to myself, I think I've got things sorted, mostly.

1.  The thing I was most sure of, I was wrong about.  Some years back, I 
must have split the mirrors so that they used different brand disks.  I 
probably did this, maybe even accidentally, when I had to restore from 
backups at one point.   I suppose I could have physically labeled the 
carriers...no, that's crazy talk!


2.  The dd trick doesn't produce reliable activity light activation in 
my system.  I think some of the drives and/or controllers only turn on 
the activity light for writes.


3.  However, in spite of all this, I have replaced the disks in mirror-0 
with the bigger disks (via attach-new-resilver-detach-old), and added 
the third drive I bought as a hot spare.  All without having to restore 
from backups.


4.  AND I know which physical drive the detached 400GB drive is.  It 
occurs to me I could make that a second hot spare -- there are 4 
remaining 400GB drives in the pool, so it's useful for 2/3 of the 
failures by drive count.


Leading to a new question -- is ZFS smart about hot spare sizes?  Will 
it skip over too-small drives?  Will it, even better, prefer smaller 
drives to larger so long as they are big enough (thus leaving the big 
drives for bigger failures)?


--
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS and TRIM - No need for TRIM

2011-02-06 Thread Roy Sigurd Karlsbakk
> 2) And later, when Solaris gets TRIM support, should I reformat or is
> there no need to reformat? I mean, maybe I must format and reinstall
> to get TRIM all over the disk. Or will TRIM immediately start to do
> it's magic?

Trim works on the device level, so a reformat won't be necessary

Vennlige hilsener / Best regards

roy
--
Roy Sigurd Karlsbakk
(+47) 97542685
r...@karlsbakk.net
http://blogg.karlsbakk.net/
--
I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er 
et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av 
idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og 
relevante synonymer på norsk.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Replace block devices to increase pool size

2011-02-06 Thread taemun
If autoexpand = on, then yes.
zpool get autoexpand 
zpool set autoexpand=on 

The expansion is vdev specific, so if you replaced the mirror first, you'd
get that much (the extra 2TB) without touching the raidz.

Cheers,

On 7 February 2011 01:41, Achim Wolpers  wrote:

> Hi!
>
> I have a zpool biult up from two vdrives (one mirror and one raidz). The
> raidz is built up from 4x1TB HDs. When I successively replace each 1TB
> drive with a 2TB drive will the capacity of the raidz double after the
> last block device is replaced?
>
> Achim
>
>
>
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
>
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Replace block devices to increase pool size

2011-02-06 Thread Achim Wolpers
Hi!

I have a zpool biult up from two vdrives (one mirror and one raidz). The
raidz is built up from 4x1TB HDs. When I successively replace each 1TB
drive with a 2TB drive will the capacity of the raidz double after the
last block device is replaced?

Achim




signature.asc
Description: OpenPGP digital signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Identifying drives (SATA)

2011-02-06 Thread David Dyer-Bennet

On 2011-02-06 05:58, Orvar Korvar wrote:

Will this not ruin the zpool? If you overwrite one of discs in the zpool won't 
the zpool go broke, so you need to repair it?


Without quoting I can't tell what you think you're responding to, but 
from my memory of this thread, I THINK you're forgetting how dd works. 
The dd commands being proposed to create drive traffic are all read-only 
accesses, so they shouldn't damage anything


--
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Identifying drives (SATA)

2011-02-06 Thread Roy Sigurd Karlsbakk
> Will this not ruin the zpool? If you overwrite one of discs in the
> zpool won't the zpool go broke, so you need to repair it?

As suggested, dd if=/dev/rdsk/c8t3d0s0 of=/dev/null bs=4k count=10, that 
will do its best to overwrite /dev/null, which the system is likely to allow :P

Vennlige hilsener / Best regards

roy
--
Roy Sigurd Karlsbakk
(+47) 97542685
r...@karlsbakk.net
http://blogg.karlsbakk.net/
--
I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er 
et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av 
idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og 
relevante synonymer på norsk.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Identifying drives (SATA)

2011-02-06 Thread Orvar Korvar
Will this not ruin the zpool? If you overwrite one of discs in the zpool won't 
the zpool go broke, so you need to repair it?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS and TRIM - No need for TRIM

2011-02-06 Thread Orvar Korvar
Ok, so can we say that the conclusion for a home user is:

1) Using SSD without TRIM is acceptable. The only drawback is that without 
TRIM, the SSD will write much more, which effects life time. Because when the 
SSD has written enough, it will break.

I dont have high demands for my OS disk, so battery backup is overkill for my 
needs. 

So I can happily settle for the next gen Intel G3 SSD disk, without worrying 
the SSD will break because Solaris has no TRIM support yet?




2) And later, when Solaris gets TRIM support, should I reformat or is there no 
need to reformat? I mean, maybe I must format and reinstall to get TRIM all 
over the disk. Or will TRIM immediately start to do it's magic?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS/Drobo (Newbie) Question

2011-02-06 Thread Orvar Korvar
Yes, you create three groups as you described and insert them into your zpool 
(the zfs raid). So you have only one ZFS raid, consisting of three groups. You 
dont have three different ZFS raids (unless you configure that).

You can also later, swap one disk to a larger and repair the group. Then you 
swap the next disk to a larger, etc. When all disks are swapped, the group will 
be bigger.

And remember, you can never change the number of disks in a group. But you can 
add a new group. And you can also grow the group by swapping each disk to a 
larger.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Drive id confusion

2011-02-06 Thread Chris Ridd

On 6 Feb 2011, at 03:14, David Dyer-Bennet wrote:

> I'm thinking either Solaris' appalling mess of device files is somehow scrod, 
> or else ZFS is confused in its reporting (perhaps because of cache file 
> contents?).  Is there anything I can do about either of these?  Does devfsadm 
> really create the apporpirate /dev/dsk and etc. files based on what's present?

Is reviewing the source code to devfsadm helpful? I bet it hasn't changed much 
from:



Chris
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS and spindle speed (7.2k / 10k / 15k)

2011-02-06 Thread Brandon High
On Sat, Feb 5, 2011 at 3:34 PM, Roy Sigurd Karlsbakk  wrote:
>> so as not to exceed the channel bandwidth. When they need to get higher disk
>> capacity, they add more platters.
>
> May this mean those drives are more robust in terms of reliability, since the 
> leaks between sectors is less likely with the lower density?

More platters leads to more heat and higher power consumption. Most
drives are 3 or 4 platters, though Hitachi usually manufactures 5
platter drives as well.

-B

-- 
Brandon High : bh...@freaks.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss