Re: [zfs-discuss] Advanced Format HDD's - are we there yet? (or - how to buy a drive that won't be teh sux0rs on zfs)

2012-05-29 Thread John Martin

On 05/28/12 08:48, Nathan Kroenert wrote:


Looking to get some larger drives for one of my boxes. It runs
exclusively ZFS and has been using Seagate 2TB units up until now (which
are 512 byte sector).

Anyone offer up suggestions of either 3 or preferably 4TB drives that
actually work well with ZFS out of the box? (And not perform like
rubbish)...

I'm using Oracle Solaris 11 , and would prefer not to have to use a
hacked up zpool to create something with ashift=12.


Are you replacing a failed drive or creating a new pool?

I had a drive in a mirrored pool recently fail.  Both
drives were 1TB Seagate ST310005N1A1AS-RK with 512 byte sectors.
All the 1TB Seagate boxed drives I could find with the same
part number on the box (with factory seals in place)
were really ST1000DM003-9YN1 with 512e/4196p.  Just being
cautious, I ended up migrating the pools over to a pair
of the new drives.  The pools were created with ashift=12
automatically:

  $ zdb -C | grep ashift
  ashift: 12
  ashift: 12
  ashift: 12

Resilvering the three pools concurrently went fairly quickly:

  $ zpool status
scan: resilvered 223G in 2h14m with 0 errors on Tue May 22 21:02:32 
2012
scan: resilvered 145G in 4h13m with 0 errors on Tue May 22 23:02:38 
2012
scan: resilvered 153G in 3h44m with 0 errors on Tue May 22 22:30:51 
2012


What performance problem do you expect?
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Advanced Format HDD's - are we there yet? (or - how to buy a drive that won't be teh sux0rs on zfs)

2012-05-29 Thread bofh
On Tue, May 29, 2012 at 6:54 AM, John Martin john.m.mar...@oracle.com wrote:
  $ zdb -C | grep ashift
              ashift: 12
              ashift: 12
              ashift: 12


That's interesting.  I just created a raidz3 pool out of 7x3TB drives.
 My drives were
ST3000DM001-9YN1
Hitachi HDS72303
Hitachi HDS72303
ST3000DM001-9YN1
Hitachi HDS5C303
Hitachi HDS5C303
ST33000651AS

ashift:9  is that standard?  I did nothing but plug them in and zpool
create.  Seem to run pretty fast, I can have up to 400 MB/s writes
from /dev/zero... :)



-- 
http://www.glumbert.com/media/shift
http://www.youtube.com/watch?v=tGvHNNOLnCk
This officer's men seem to follow him merely out of idle curiosity.
-- Sandhurst officer cadet evaluation.
Securing an environment of Windows platforms from abuse - external or
internal - is akin to trying to install sprinklers in a fireworks
factory where smoking on the job is permitted.  -- Gene Spafford
learn french:  http://www.youtube.com/watch?v=30v_g83VHK4
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Advanced Format HDD's - are we there yet? (or - how to buy a drive that won't be teh sux0rs on zfs)

2012-05-29 Thread Nathan Kroenert

 Hi John,

Actually, last time I tried the whole AF (4k) thing, it's performance 
was worse than woeful.


But admittedly, that was a little while ago.

The drives were the seagate green barracuda IIRC, and performance for 
just about everything was 20MB/s per spindle or worse, when it should 
have been closer to 100MB/s when streaming. Things were worse still when 
doing random...


I'm actually looking to put in something larger than the 3*2TB drives 
(triple mirror for read perf) this pool has in it - preferably 3 * 4TB 
drives. (I don't want to put in more spindles - just replace the current 
ones...)


I might just have to bite the bullet and try something with current SW. :).

Nathan.


On 05/29/12 08:54 PM, John Martin wrote:

On 05/28/12 08:48, Nathan Kroenert wrote:


Looking to get some larger drives for one of my boxes. It runs
exclusively ZFS and has been using Seagate 2TB units up until now (which
are 512 byte sector).

Anyone offer up suggestions of either 3 or preferably 4TB drives that
actually work well with ZFS out of the box? (And not perform like
rubbish)...

I'm using Oracle Solaris 11 , and would prefer not to have to use a
hacked up zpool to create something with ashift=12.


Are you replacing a failed drive or creating a new pool?

I had a drive in a mirrored pool recently fail.  Both
drives were 1TB Seagate ST310005N1A1AS-RK with 512 byte sectors.
All the 1TB Seagate boxed drives I could find with the same
part number on the box (with factory seals in place)
were really ST1000DM003-9YN1 with 512e/4196p.  Just being
cautious, I ended up migrating the pools over to a pair
of the new drives.  The pools were created with ashift=12
automatically:

  $ zdb -C | grep ashift
  ashift: 12
  ashift: 12
  ashift: 12

Resilvering the three pools concurrently went fairly quickly:

  $ zpool status
scan: resilvered 223G in 2h14m with 0 errors on Tue May 22 
21:02:32 2012
scan: resilvered 145G in 4h13m with 0 errors on Tue May 22 
23:02:38 2012
scan: resilvered 153G in 3h44m with 0 errors on Tue May 22 
22:30:51 2012


What performance problem do you expect?


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Advanced Format HDD's - are we there yet? (or - how to buy a drive that won't be teh sux0rs on zfs)

2012-05-29 Thread John Martin

On 05/29/12 08:35, Nathan Kroenert wrote:

Hi John,

Actually, last time I tried the whole AF (4k) thing, it's performance
was worse than woeful.

But admittedly, that was a little while ago.

The drives were the seagate green barracuda IIRC, and performance for
just about everything was 20MB/s per spindle or worse, when it should
have been closer to 100MB/s when streaming. Things were worse still when
doing random...

I'm actually looking to put in something larger than the 3*2TB drives
(triple mirror for read perf) this pool has in it - preferably 3 * 4TB
drives. (I don't want to put in more spindles - just replace the current
ones...)

I might just have to bite the bullet and try something with current SW. :).



Raw read from one of the mirrors:

#  timex dd if=/dev/rdsk/c0t2d0s2 of=/dev/null bs=1024000 count=1
1+0 records in
1+0 records out

real  49.26
user   0.01
sys0.27


filebench filemicro_seqread reports an impossibly high number (4GB/s)
so the ARC is likely handling all reads.

The label on the boxes I bought say:

  1TB 32MB INTERNAL KIT 7200
  ST310005N1A1AS-RK
  S/N: ...
  PN:9BX1A8-573

The drives in the box were really
ST1000DM003-9YN162 with 64MB of cache.
I have multiple pools on each disk so the
cache should be disabled.  The drive reports
512 byte logical sectors and 4096 physical sectors.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Advanced Format HDD's - are we there yet? (or - how to buy a drive that won't be teh sux0rs on zfs)

2012-05-29 Thread John Martin

On 05/29/12 07:26, bofh wrote:


ashift:9  is that standard?


Depends on what the drive reports as physical sector size.


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Advanced Format HDD's - are we there yet? (or - how to buy a drive that won't be teh sux0rs on zfs)

2012-05-29 Thread Casper . Dik


The drives were the seagate green barracuda IIRC, and performance for 
just about everything was 20MB/s per spindle or worse, when it should 
have been closer to 100MB/s when streaming. Things were worse still when 
doing random...

It is possible that your partitions weren't aligned at 4K and that will 
give serious issues with those drives (Solaris now tries to make sure that 
all partitions are on 4K boundaries or makes sure that the zpool dev_t is 
aligned to 4K.

Casper

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Advanced Format HDD's - are we there yet? (or - how to buy a drive that won't be teh sux0rs on zfs)

2012-05-29 Thread Jim Klimov

2012-05-29 16:35, Nathan Kroenert wrote:

Hi John,

Actually, last time I tried the whole AF (4k) thing, it's performance
was worse than woeful.

But admittedly, that was a little while ago.

The drives were the seagate green barracuda IIRC, and performance for
just about everything was 20MB/s per spindle or worse, when it should
have been closer to 100MB/s when streaming. Things were worse still when
doing random...


On one hand, it is possible that being green, the drives aren't very
capable of fast IO - they had different design goals and tradeoffs.

But actually I was going to ask if you paid attention to partitioning?
At what offsets did your ZFS pool data start? Was that offset divisible
by 4KB (i.e. 256 512byte sectors as is default now vs 34 sectors of
the older default)?

If the drive had 4kb native sectors but the logical FS blocks were
not aligned with that, then every write IO would involve RMW of
many sectors (perhaps disk's caching might alleviate this for
streaming writes though).

Also note that ZFS IO often is random even for reads, since you
have to read metadata and file data often from different dispersed
locations. Again, OS caching helps statistically, when you have
much RAM dedicated to caching. Hmmm... did you use dedup in those
tests?- that is another source of performance degradation on smaller
machines (under tens of GBs of RAM).

HTH,
//Jim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Advanced Format HDD's - are we there yet? (or - how to buy a drive that won't be teh sux0rs on zfs)

2012-05-29 Thread Richard Elling
On May 29, 2012, at 6:10 AM, Jim Klimov wrote:
 Also note that ZFS IO often is random even for reads, since you
 have to read metadata and file data often from different dispersed
 locations.

This is true for almost all other file systems, too. For example, in UFS, 
metadata is stored in fixed locations on the disk as defined when the
filesystem is created.
 -- richard

--
ZFS Performance and Training
richard.ell...@richardelling.com
+1-760-896-4422







___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Advanced Format HDD's - are we there yet? (or - how to buy a drive that won't be teh sux0rs on zfs)

2012-05-29 Thread nathan

On 29/05/2012 11:10 PM, Jim Klimov wrote:

2012-05-29 16:35, Nathan Kroenert wrote:

Hi John,

Actually, last time I tried the whole AF (4k) thing, it's performance
was worse than woeful.

But admittedly, that was a little while ago.

The drives were the seagate green barracuda IIRC, and performance for
just about everything was 20MB/s per spindle or worse, when it should
have been closer to 100MB/s when streaming. Things were worse still when
doing random...


On one hand, it is possible that being green, the drives aren't very
capable of fast IO - they had different design goals and tradeoffs.


Indeed! I just wasn't expecting it to be so profound.

But actually I was going to ask if you paid attention to partitioning?
At what offsets did your ZFS pool data start? Was that offset divisible
by 4KB (i.e. 256 512byte sectors as is default now vs 34 sectors of
the older default)?
It was. Actually I tried it in a variety of ways, including auto EFI 
partition (zpool create with the whole disk), using SMI label, and 
trying a variety of tricks with offsets. Again, it was a while ago - 
before the time of the SD RMW fix...


If the drive had 4kb native sectors but the logical FS blocks were
not aligned with that, then every write IO would involve RMW of
many sectors (perhaps disk's caching might alleviate this for
streaming writes though).
Yep - that's what it *felt* like, and I didn't seem to be able to change 
that at the time.


Also note that ZFS IO often is random even for reads, since you
have to read metadata and file data often from different dispersed
locations. Again, OS caching helps statistically, when you have
much RAM dedicated to caching. Hmmm... did you use dedup in those
tests?- that is another source of performance degradation on smaller
machines (under tens of GBs of RAM).


At the time, I had 1TB of data, and 1TB of space... I'd expect that most 
of the data would have been written 'closeish' to sequential on disk, 
though I'll confess I only spent a short time looking at the 'physical' 
read/write locations being send down through the stack. (where the drive 
writes them - well.. That's different. ;)


I have been contacted off list by a few folks that have indicated 
success with current drives and current Solaris bits. I'm thinking that 
it might be time to take another run at it.


I'll let the list know the results. ;)

Cheers

Nathan.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Advanced Format HDD's - are we there yet? (or - how to buy a drive that won't be teh sux0rs on zfs)

2012-05-28 Thread Nathan Kroenert

 Hi folks,

Looking to get some larger drives for one of my boxes. It runs 
exclusively ZFS and has been using Seagate 2TB units up until now (which 
are 512 byte sector).


Anyone offer up suggestions of either 3 or preferably 4TB drives that 
actually work well with ZFS out of the box? (And not perform like 
rubbish)...


I'm using Oracle Solaris 11 , and would prefer not to have to use a 
hacked up zpool to create something with ashift=12.


Thoughts on the best drives - or is Solaris 11 actually ready to go with 
whatever I throw at it? :)


And - am I doomed to have to use these so called 'advanced format' 
drives (which as far as I can tell are in no way actually advanced, and 
only benefit HDD makers and not the end user).


Cheers!

Nathan.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Advanced Format HDD's - are we there yet? (or - how to buy a drive that won't be teh sux0rs on zfs)

2012-05-28 Thread Nigel W
On Mon, May 28, 2012 at 6:48 AM, Nathan Kroenert nat...@tuneunix.com wrote:
 Anyone offer up suggestions of either 3 or preferably 4TB drives that
 actually work well with ZFS out of the box? (And not perform like
 rubbish)...
With our NCP 3 boxes the WD drives seem to be working okay (this is
with consumer level drives, which for what we do with our NCP boxes at
$work seems to be working out okay).

On Mon, May 28, 2012 at 6:48 AM, Nathan Kroenert nat...@tuneunix.com wrote:
 And - am I doomed to have to use these so called 'advanced format' drives
 (which as far as I can tell are in no way actually advanced, and only
 benefit HDD makers and not the end user).

Yes.  All of the manufactures are moving to use advanced format
drives, more accurately known as 4K sector size drives.  After a snafu
last week at $work where a 512 byte pool would not resilver with a 4K
drive plugged in, it appears that (keep in mind that these are
consumer drives) Seagate no longer manufactures the 7200.12 series
drives which has a select-able sector size.  The new 7200.14 series is
4k only.  WD for the time being appears to still present 512 byte
sectors in their current lineup. What kind of performance penalty this
carries I don't know as we have not tested any as of yet.  Presumably
though, WD is going to stop doing that eventually just like Seagate
already has.

No, you are correct, the advanced format thing is just marketing.
But it does have very real benefits and has everything to do (at least
as far as I understand the technical details from the HDD
manufactures) with the bit density and the sector count getting so
high on these multi-terabyte drives that they are having to put a
larger and larger percentage (and by extension absolute number of GBs)
of the platter area to be used for ECC and sector locating magic.
This means they are wasting more and more platter space trying to use
the 512-byte sectors.  By using the larger sectors they can use less
sectors which means they have less wasted space. So all around it is
good for everyone its just the HDD manufactures are trying to change
15+ years of complacency with respect to the sector size of HDDs.

Nigel
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Advanced Format HDD's - are we there yet? (or - how to buy a drive that won't be teh sux0rs on zfs)

2012-05-28 Thread Richard Elling

On May 28, 2012, at 5:48 AM, Nathan Kroenert wrote:

 Hi folks,
 
 Looking to get some larger drives for one of my boxes. It runs exclusively 
 ZFS and has been using Seagate 2TB units up until now (which are 512 byte 
 sector).
 
 Anyone offer up suggestions of either 3 or preferably 4TB drives that 
 actually work well with ZFS out of the box? (And not perform like rubbish)...
 
 I'm using Oracle Solaris 11 , and would prefer not to have to use a hacked up 
 zpool to create something with ashift=12.
 
 Thoughts on the best drives - or is Solaris 11 actually ready to go with 
 whatever I throw at it? :)

Ashift is set automatically if the disk is truly 4k sector only (and doesn't 
lie). This has been true for
ZFS for a very, very long time.
 -- richard

--
ZFS Performance and Training
richard.ell...@richardelling.com
+1-760-896-4422







___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Advanced Format HDD's - are we there yet? (or - how to buy a drive that won't be teh sux0rs on zfs)

2012-05-28 Thread Daniel Carosone
On Mon, May 28, 2012 at 09:23:25AM -0600, Nigel W wrote:
 After a snafu
 last week at $work where a 512 byte pool would not resilver with a 4K
 drive plugged in, it appears that (keep in mind that these are
 consumer drives) Seagate no longer manufactures the 7200.12 series
 drives which has a select-able sector size.  The new 7200.14 series is
 4k only.  

Does this mean they actually present with 4k sectors externally,
rather than use 4k internally and emulate 512b externally?  If so,
this is a good thing - and good to know.

 WD for the time being appears to still present 512 byte
 sectors in their current lineup. What kind of performance penalty this
 carries I don't know as we have not tested any as of yet.  Presumably
 though, WD is going to stop doing that eventually just like Seagate
 already has.

One hopes so.

There are two problems using ZFS on drives with 4k sectors:

 1) if the drive lies and presents 512-byte sectors, and you don't
manually force ashift=12, then the emulation can be slow (and
possibly error prone). There is essentially an internal RMW cycle
when a 4k sector is partially updated.  We use ZFS to get away
from the perils of RMW :) 

 2) with ashift=12, whther forced manually or automatically because
the disks present 4k sectors, ZFS is less space-efficient for
metadata and keeps fewer historical uberblocks.

For choosing a tradeoff today, I'll take 2 over 1, after experience
with both. 1 bites, seemingly especially with raidz types, but also
with mirrors.  Also because a code change could at least improve the
metadata packing in future.

AFAIK, Hitachi is the only vendor still offering 512-native consumer
drives in the 23T sizes.  They cost a little more, so that's another
tradeoff. 

--
Dan.

pgpy1Zzg4K50L.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Advanced Format HDD's - are we there yet? (or - how to buy a drive that won't be teh sux0rs on zfs)

2012-05-28 Thread Nigel W
On Mon, May 28, 2012 at 6:13 PM, Daniel Carosone d...@geek.com.au wrote:
 On Mon, May 28, 2012 at 09:23:25AM -0600, Nigel W wrote:
 After a snafu
 last week at $work where a 512 byte pool would not resilver with a 4K
 drive plugged in, it appears that (keep in mind that these are
 consumer drives) Seagate no longer manufactures the 7200.12 series
 drives which has a select-able sector size.  The new 7200.14 series is
 4k only.

 Does this mean they actually present with 4k sectors externally,
 rather than use 4k internally and emulate 512b externally?  If so,
 this is a good thing - and good to know.

Based on the numbers stamped on drive and Seagate support, yes the
7200.14 present 4k sectors and the 7200.12 have a jumper that switches
between 512 and 4k; though I don't know if that means the disk is 4k
or 512 internally.

On Mon, May 28, 2012 at 6:13 PM, Daniel Carosone d...@geek.com.au wrote:
 There are two problems using ZFS on drives with 4k sectors:

  1) if the drive lies and presents 512-byte sectors, and you don't
    manually force ashift=12, then the emulation can be slow (and
    possibly error prone). There is essentially an internal RMW cycle
    when a 4k sector is partially updated.  We use ZFS to get away
    from the perils of RMW :)

  2) with ashift=12, whther forced manually or automatically because
    the disks present 4k sectors, ZFS is less space-efficient for
    metadata and keeps fewer historical uberblocks.

 For choosing a tradeoff today, I'll take 2 over 1, after experience
 with both. 1 bites, seemingly especially with raidz types, but also
 with mirrors.  Also because a code change could at least improve the
 metadata packing in future.

Yes that would suck the performance out and it is something that we
have discussed at $work though so far it seems we have just lucked out
and haven't seen the performance issues as a result of this.

On Mon, May 28, 2012 at 6:13 PM, Daniel Carosone d...@geek.com.au wrote:
 AFAIK, Hitachi is the only vendor still offering 512-native consumer
 drives in the 23T sizes.  They cost a little more, so that's another
 tradeoff.
Hmm. That is interesting to know. At very least another possible
source of 512-byte drives if we need them for replacing drives in
pools that are stuck with ashift=9.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Advanced Format HDD's - are we there yet? (or - how to buy a drive that won't be teh sux0rs on zfs)

2012-05-28 Thread Bill Sommerfeld

On 05/28/12 17:13, Daniel Carosone wrote:

There are two problems using ZFS on drives with 4k sectors:

  1) if the drive lies and presents 512-byte sectors, and you don't
 manually force ashift=12, then the emulation can be slow (and
 possibly error prone). There is essentially an internal RMW cycle
 when a 4k sector is partially updated.  We use ZFS to get away
 from the perils of RMW :)

  2) with ashift=12, whther forced manually or automatically because
 the disks present 4k sectors, ZFS is less space-efficient for
 metadata and keeps fewer historical uberblocks.


two, more specific, problems I've run into recently:

 1) if you move a disk with an ashift=9 pool on it from a 
controller/enclosure/.. combo where it claims to have 512 byte sectors 
to a path where it is detected as having 4k sectors (even if it can cope 
with 512-byte aligned I/O), the pool will fail to import and appear to 
be gravely corrupted; the error message you get will make no mention of 
the sector size change.  Move the disk back to the original location and 
it imports cleanly.


 2) if you have a pool with ashift=9 and a disk dies, and the intended 
replacement is detected as having 4k sectors, it will not be possible to 
attach the disk as a replacement drive..


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss