Re: [zfs-discuss] Drive upgrades

2012-04-17 Thread Peter Jeremy
On 2012-Apr-17 17:25:36 +1000, Jim Klimov  wrote:
>For the sake of archives, can you please post a common troubleshooting
>techinque which users can try at home to see if their disks honour the
>request or not? ;) I guess it would involve random-write bandwidths in
>two cases?

1) Issue "disable write cache" command to drive
2) Write several MB of data to drive
3) As soon as drive acknowledges completion, remove power to drive (this
   will require a electronic switch in the drive's power lead)
4) Wait until drive spins down.
5) Power up drive and wait until ready
6) Verify data written in (2) can be read.
7) Argue with drive vendor that drive doesn't meet specifications :-)

A similar approach can also be used to verify that NCQ & cache flush
commands actually work.

-- 
Peter Jeremy


pgp4WNXKBfWaW.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Drive upgrades

2012-04-17 Thread Richard Elling
On Apr 17, 2012, at 12:25 AM, Jim Klimov wrote:

> 2012-04-17 5:15, Richard Elling wrote:
>> For the archives...
>> 
>> Write-back cache enablement is toxic for file systems that do not issue
>> cache flush commands, such as Solaris' UFS. In the early days of ZFS,
>> on Solaris 10 or before ZFS was bootable on OpenSolaris, it was not
> > uncommon to have ZFS and UFS on the same system.
>> 
>> NB, there are a number of consumer-grade IDE/*ATA disks that ignore
>> disabling
>> the write buffer. Hence, it is not always a win to enable the write
>> buffer that cannot
>> be disabled.
>> -- richard
> 
> For the sake of archives, can you please post a common troubleshooting
> techinque which users can try at home to see if their disks honour the
> request or not? ;) I guess it would involve random-write bandwidths in
> two cases?

I am aware of only one method that is guaranteed to work: contact the 
manufacturer, sign NDA, read the docs.
 -- richard

--
ZFS Performance and Training
richard.ell...@richardelling.com
+1-760-896-4422







___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Drive upgrades

2012-04-17 Thread Jim Klimov

2012-04-17 5:15, Richard Elling wrote:

For the archives...

Write-back cache enablement is toxic for file systems that do not issue
cache flush commands, such as Solaris' UFS. In the early days of ZFS,
on Solaris 10 or before ZFS was bootable on OpenSolaris, it was not

> uncommon to have ZFS and UFS on the same system.


NB, there are a number of consumer-grade IDE/*ATA disks that ignore
disabling
the write buffer. Hence, it is not always a win to enable the write
buffer that cannot
be disabled.
-- richard


For the sake of archives, can you please post a common troubleshooting
techinque which users can try at home to see if their disks honour the
request or not? ;) I guess it would involve random-write bandwidths in
two cases?

And for the sake of archives, here's what I do on my home system for
its pools to toggle the cache on disks involved (could be scripted
better to detect disk names from zpool listing, but works-for-me
as-is):

# cat /etc/rc2.d/S95disable-pool-wcache
#!/bin/sh

case "$1" in
start)
for C in 7; do for T in 0 1 2 3 4 5; do
( echo cache; echo write; echo display; echo disable; echo 
display ) | format -e -d c${C}t${T}d0 &

done; done
wait
sync
;;
stop)
for C in 7; do for T in 0 1 2 3 4 5; do
( echo cache; echo write; echo display; echo enable; echo 
display ) | format -e -d c${C}t${T}d0 &

done; done
wait
sync
;;
*)
for C in 7; do for T in 0 1 2 3 4 5; do
( echo cache; echo write; echo display ) | format -e -d 
c${C}t${T}d0 &

done; done
wait
sync
;;
esac
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Drive upgrades

2012-04-16 Thread Richard Elling
For the archives...

On Apr 16, 2012, at 3:37 PM, Peter Jeremy wrote:

> On 2012-Apr-14 02:30:54 +1000, Tim Cook  wrote:
>> You will however have an issue replacing them if one should fail.  You need 
>> to have the same block count to replace a device, which is why I asked for a 
>> "right-sizing" years ago.
> 
> The "traditional" approach this is to slice the disk yourself so you have a 
> slice size with a known area and a dummy slice of a couple of GB in case a 
> replacement is a bit smaller.  Unfortunately, ZFS on Solaris disables the 
> drive cache if you don't give it a complete disk so this approach incurs as 
> significant performance overhead there.  FreeBSD leaves the drive cache 
> enabled in either situation.  I'm not sure how OI or Linux behave.

Write-back cache enablement is toxic for file systems that do not issue cache 
flush commands, such as Solaris' UFS. In the early days of ZFS, on Solaris 10 or
before ZFS was bootable on OpenSolaris, it was not uncommon to have ZFS and
UFS on the same system.

NB, there are a number of consumer-grade IDE/*ATA disks that ignore disabling
the write buffer. Hence, it is not always a win to enable the write buffer that 
cannot
be disabled.
 -- richard

--
ZFS Performance and Training
richard.ell...@richardelling.com
+1-760-896-4422







___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Drive upgrades

2012-04-16 Thread Edward Ned Harvey
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> boun...@opensolaris.org] On Behalf Of Peter Jeremy
> 
> On 2012-Apr-14 02:30:54 +1000, Tim Cook  wrote:
> >You will however have an issue replacing them if one should fail.  You
need
> to have the same block count to replace a device, which is why I asked for
a
> "right-sizing" years ago.
> 
> The "traditional" approach this is to slice the disk yourself so you have
a slice
> size with a known area and a dummy slice of a couple of GB in case a
> replacement is a bit smaller.  Unfortunately, ZFS on Solaris disables the
drive
> cache if you don't give it a complete disk so this approach incurs as
significant
> performance overhead there.  

It's not so much that it "disables" it, as "doesn't enable" it.  By default,
for anything, the write back cache (on-disk) would be disabled, but if
you're using the whole disk for ZFS, then ZFS enables it, because it's known
to be safe.  (Unless... nevermind.)

Whenever I've deployed ZFS on partitions, I just script the enabling of the
writeback.  So Peter's message is true, but it's solvable.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Drive upgrades

2012-04-16 Thread Peter Jeremy
On 2012-Apr-14 02:30:54 +1000, Tim Cook  wrote:
>You will however have an issue replacing them if one should fail.  You need to 
>have the same block count to replace a device, which is why I asked for a 
>"right-sizing" years ago.

The "traditional" approach this is to slice the disk yourself so you have a 
slice size with a known area and a dummy slice of a couple of GB in case a 
replacement is a bit smaller.  Unfortunately, ZFS on Solaris disables the drive 
cache if you don't give it a complete disk so this approach incurs as 
significant performance overhead there.  FreeBSD leaves the drive cache enabled 
in either situation.  I'm not sure how OI or Linux behave.

-- 
Peter Jeremy


pgprzpycAxFkZ.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Drive upgrades

2012-04-15 Thread Daniel Carosone
On Sat, Apr 14, 2012 at 09:04:45AM -0400, Edward Ned Harvey wrote:
> Then, about 2 weeks later, the support rep emailed me to say they
> implemented a new feature, which could autoresize +/- some small
> percentage difference, like 1Mb difference or something like that. 

There are two elements to this:
 - the size of actual data on the disk
 - the logical block count, and the resulting LBAs of the labels
   positioned relative to the end of the disk.

The available size of the disk has always been rounded to a whole
number of metaslabs, once the front and back label space is trimmed
off. Combined with the fact that metaslab size is determined
dynamically at vdev creation time based on device size, there can
easily be some amount of unused space at the end, after the last
metaslab and before the end labels. 

It is slop in this space that allows for the small differences you
describe above, even for disks laid out in earlier zpool versions.  
A little poking with zdb and a few calculations will show you just how
much a given disk has. 

However, to make the replacement actually work, the zpool code needed
to not insist on an absoute >= number of blocks (rather to check the
more proper condition, that there was room for all the metaslabs).
There was also testing to ensure that it handled the end labels moving
inwards in absolute position, for a replacement onto slightly smaller
rather than same/larger disks. That was the change that happened at
the time.

(If you somehow had disks that fit exactly a whole number of
metaslabs, you might still have an issue, I suppose. Perhaps that's
likely if you carefully calculated LUN sizes to carve out of some
other storage, in which case you can do the same for replacements.)

--
Dan.



pgpg1ciooKHti.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Drive upgrades

2012-04-14 Thread Richard Elling
http://wesunsolve.net/bugid/id/6563887
 -- richard

On Apr 14, 2012, at 6:04 AM, Edward Ned Harvey wrote:

>> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
>> boun...@opensolaris.org] On Behalf Of Freddie Cash
>> 
>> I thought ZFSv20-something added a "if the blockcount is within 10%,
>> then allow the replace to succeed" feature, to work around this issue?
> 
> About 2 yrs ago, I replaced a drive with 1 block less, and it was a big 
> problem.  This was a drive bought from oracle, to replace an oracle drive, on 
> a supported sun system, and it was the same model drive, with a higher 
> firmware rev.  We worked on it extensively, eventually managed to shoe-horn 
> the drive in there, and I pledged I would always partition drives slightly 
> smaller from now on.
> 
> Then, about 2 weeks later, the support rep emailed me to say they implemented 
> a new feature, which could autoresize +/- some small percentage difference, 
> like 1Mb difference or something like that.
> 
> So there is some solid reason to corroborate Freddie's suspicion, but there's 
> no way I'm going to find any material to reference now.  The timing even 
> sounds about right to support the v20 idea.  I haven't tested or proven it 
> myself, but I am confidently assuming moving forward, that small variations 
> will be handled gracefully.
> 
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

--
ZFS Performance and Training
richard.ell...@richardelling.com
+1-760-896-4422







___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Drive upgrades

2012-04-14 Thread Edward Ned Harvey
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> boun...@opensolaris.org] On Behalf Of Freddie Cash
> 
> I thought ZFSv20-something added a "if the blockcount is within 10%,
> then allow the replace to succeed" feature, to work around this issue?

About 2 yrs ago, I replaced a drive with 1 block less, and it was a big 
problem.  This was a drive bought from oracle, to replace an oracle drive, on a 
supported sun system, and it was the same model drive, with a higher firmware 
rev.  We worked on it extensively, eventually managed to shoe-horn the drive in 
there, and I pledged I would always partition drives slightly smaller from now 
on.

Then, about 2 weeks later, the support rep emailed me to say they implemented a 
new feature, which could autoresize +/- some small percentage difference, like 
1Mb difference or something like that.

So there is some solid reason to corroborate Freddie's suspicion, but there's 
no way I'm going to find any material to reference now.  The timing even sounds 
about right to support the v20 idea.  I haven't tested or proven it myself, but 
I am confidently assuming moving forward, that small variations will be handled 
gracefully.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Drive upgrades

2012-04-13 Thread Stephan Budach

Am 13.04.12 19:22, schrieb Tim Cook:



On Fri, Apr 13, 2012 at 11:46 AM, Freddie Cash > wrote:


On Fri, Apr 13, 2012 at 9:30 AM, Tim Cook mailto:t...@cook.ms>> wrote:
> You will however have an issue replacing them if one should
fail.  You need
> to have the same block count to replace a device, which is why I
asked for a
> "right-sizing" years ago.  Deaf ears :/

I thought ZFSv20-something added a "if the blockcount is within 10%,
then allow the replace to succeed" feature, to work around this issue?

--
Freddie Cash
fjwc...@gmail.com 



That would be news to me.  I'd love to hear it's true though.  When I 
made the request there was excuse after excuse about how it would be 
difficult and Sun always provided replacement drives of identical 
size, etc (although there were people who responded who in fact 
had received drives from Sun of different sizes in RMA).  I was hoping 
now that the braintrust had moved on from Sun that they'd embrace what 
I consider a common-sense decision, but I don't think it's happened.



I tend to think, that S11 even tightened the gap of that. When I 
upgraded from SE11 to S11 a couple of drives became "corrupt" when S11 
tried to mount the zpool, which consists of vdev mirrors. Switching back 
to SE1! and importing the very same zpool went without issue.


In the SR I opened for that issue, it was stated that S11 is even more 
picky about drives sizes than SE11 had been and I had to replace the 
drives with new ones. Interstingly these were all Hitachi drives of the 
"same" size, but prtvoc displayed in fact fewer sectors for the ones 
that were refused by S11 - and the percentage was closer to 5% than to 
10%, afair.


I was able to create another zpool from the drives S11 refused later on 
in S11 though.


Cheers,
budy
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Drive upgrades

2012-04-13 Thread Tim Cook
On Fri, Apr 13, 2012 at 11:46 AM, Freddie Cash  wrote:

> On Fri, Apr 13, 2012 at 9:30 AM, Tim Cook  wrote:
> > You will however have an issue replacing them if one should fail.  You
> need
> > to have the same block count to replace a device, which is why I asked
> for a
> > "right-sizing" years ago.  Deaf ears :/
>
> I thought ZFSv20-something added a "if the blockcount is within 10%,
> then allow the replace to succeed" feature, to work around this issue?
>
> --
> Freddie Cash
> fjwc...@gmail.com
>


That would be news to me.  I'd love to hear it's true though.  When I made
the request there was excuse after excuse about how it would be difficult
and Sun always provided replacement drives of identical size, etc (although
there were people who responded who in fact had received drives from Sun of
different sizes in RMA).  I was hoping now that the braintrust had moved on
from Sun that they'd embrace what I consider a common-sense decision, but I
don't think it's happened.

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Drive upgrades

2012-04-13 Thread Michael Armstrong
Yes this Is another thing im weary of... I should have slightly under 
provisioned at the start or mixed manufacturers... Now i may have to replace 
2tb fails with 2.5 for the sake of a block

Sent from my iPhone

On 13 Apr 2012, at 17:30, Tim Cook  wrote:

> 
> 
> On Fri, Apr 13, 2012 at 9:35 AM, Edward Ned Harvey 
>  wrote:
> > From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> > boun...@opensolaris.org] On Behalf Of Michael Armstrong
> >
> > Is there a way to quickly ascertain if my seagate/hitachi drives are as
> large as
> > the 2.0tb samsungs? I'd like to avoid the situation of replacing all
> drives and
> > then not being able to grow the pool...
> 
> It doesn't matter.  If you have a bunch of drives that are all approx the
> same size but vary slightly, and you make (for example) a raidz out of them,
> then the raidz will only be limited by the size of the smallest one.  So you
> will only be wasting 1% of the drives that are slightly larger.
> 
> Also, given that you have a pool currently made up of 13x2T and 5x1T ... I
> presume these are separate vdev's.  You don't have one huge 18-disk raidz3,
> do you?  that would be bad.  And it would also mean that you're currently
> wasting 13x1T.  I assume the 5x1T are a single raidzN.  You can increase the
> size of these disks, without any cares about the size of the other 13.
> 
> Just make sure you have the autoexpand property set.
> 
> But most of all, make sure you do a scrub first, and make sure you complete
> the resilver in between each disk swap.  Do not pull out more than one disk
> (or whatever your redundancy level is) while it's still resilvering from the
> previously replaced disk.  If you're very thorough, you would also do a
> scrub in between each disk swap, but if it's just a bunch of home movies
> that are replaceable, you will probably skip that step.
> 
> 
> You will however have an issue replacing them if one should fail.  You need 
> to have the same block count to replace a device, which is why I asked for a 
> "right-sizing" years ago.  Deaf ears :/
> 
> --Tim
>  
>  
> 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Drive upgrades

2012-04-13 Thread Freddie Cash
On Fri, Apr 13, 2012 at 9:30 AM, Tim Cook  wrote:
> You will however have an issue replacing them if one should fail.  You need
> to have the same block count to replace a device, which is why I asked for a
> "right-sizing" years ago.  Deaf ears :/

I thought ZFSv20-something added a "if the blockcount is within 10%,
then allow the replace to succeed" feature, to work around this issue?

-- 
Freddie Cash
fjwc...@gmail.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Drive upgrades

2012-04-13 Thread Tim Cook
On Fri, Apr 13, 2012 at 9:35 AM, Edward Ned Harvey <
opensolarisisdeadlongliveopensola...@nedharvey.com> wrote:

> > From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> > boun...@opensolaris.org] On Behalf Of Michael Armstrong
> >
> > Is there a way to quickly ascertain if my seagate/hitachi drives are as
> large as
> > the 2.0tb samsungs? I'd like to avoid the situation of replacing all
> drives and
> > then not being able to grow the pool...
>
> It doesn't matter.  If you have a bunch of drives that are all approx the
> same size but vary slightly, and you make (for example) a raidz out of
> them,
> then the raidz will only be limited by the size of the smallest one.  So
> you
> will only be wasting 1% of the drives that are slightly larger.
>
> Also, given that you have a pool currently made up of 13x2T and 5x1T ... I
> presume these are separate vdev's.  You don't have one huge 18-disk raidz3,
> do you?  that would be bad.  And it would also mean that you're currently
> wasting 13x1T.  I assume the 5x1T are a single raidzN.  You can increase
> the
> size of these disks, without any cares about the size of the other 13.
>
> Just make sure you have the autoexpand property set.
>
> But most of all, make sure you do a scrub first, and make sure you complete
> the resilver in between each disk swap.  Do not pull out more than one disk
> (or whatever your redundancy level is) while it's still resilvering from
> the
> previously replaced disk.  If you're very thorough, you would also do a
> scrub in between each disk swap, but if it's just a bunch of home movies
> that are replaceable, you will probably skip that step.
>


You will however have an issue replacing them if one should fail.  You need
to have the same block count to replace a device, which is why I asked for
a "right-sizing" years ago.  Deaf ears :/

--Tim


>
>
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Drive upgrades

2012-04-13 Thread Edward Ned Harvey
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> boun...@opensolaris.org] On Behalf Of Michael Armstrong
> 
> Is there a way to quickly ascertain if my seagate/hitachi drives are as
large as
> the 2.0tb samsungs? I'd like to avoid the situation of replacing all
drives and
> then not being able to grow the pool...

It doesn't matter.  If you have a bunch of drives that are all approx the
same size but vary slightly, and you make (for example) a raidz out of them,
then the raidz will only be limited by the size of the smallest one.  So you
will only be wasting 1% of the drives that are slightly larger.

Also, given that you have a pool currently made up of 13x2T and 5x1T ... I
presume these are separate vdev's.  You don't have one huge 18-disk raidz3,
do you?  that would be bad.  And it would also mean that you're currently
wasting 13x1T.  I assume the 5x1T are a single raidzN.  You can increase the
size of these disks, without any cares about the size of the other 13.

Just make sure you have the autoexpand property set.

But most of all, make sure you do a scrub first, and make sure you complete
the resilver in between each disk swap.  Do not pull out more than one disk
(or whatever your redundancy level is) while it's still resilvering from the
previously replaced disk.  If you're very thorough, you would also do a
scrub in between each disk swap, but if it's just a bunch of home movies
that are replaceable, you will probably skip that step.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Drive upgrades

2012-04-13 Thread Volker A. Brandt
Michael Armstrong writes:

> Is there a way to quickly ascertain if my seagate/hitachi drives are as
> large as the 2.0tb samsungs? I'd like to avoid the situation of replacing
> all drives and then not being able to grow the pool...

Hitachi prints the block count of the drives on the physical product label.
If you compare that number to the one given in the Solaris label as
printed by the prtvtoc command, you should be able to answer your
question.

Don't know about the Seagate drives, but they should at least have a block
count somewhere in their documentation.


HTH -- Volker
-- 

Volker A. Brandt   Consulting and Support for Oracle Solaris
Brandt & Brandt Computer GmbH   WWW: http://www.bb-c.de/
Am Wiesenpfad 6, 53340 Meckenheim, GERMANYEmail: v...@bb-c.de
Handelsregister: Amtsgericht Bonn, HRB 10513  Schuhgröße: 46
Geschäftsführer: Rainer J.H. Brandt und Volker A. Brandt

"When logic and proportion have fallen sloppy dead"
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Drive upgrades

2012-04-13 Thread Michael Armstrong
Hi Guys,I currently have a 18 drive system built from 13x 2.0tb Samsung's and 5x WD 1tb's... I'm about to swap out all of my 1tb drives with 2tb ones to grow the pool a bit... My question is...The replacement 2tb drives are from various manufacturers (seagate/hitachi/samsung) and I know from previous experience that the geometry/boundaries of each manufacturer's 2tb offerings are different.Is there a way to quickly ascertain if my seagate/hitachi drives are as large as the 2.0tb samsungs? I'd like to avoid the situation of replacing all drives and then not being able to grow the pool...Thanks,Michael
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss