Re: [zfs-discuss] ashift and vdevs

2010-12-02 Thread Miles Nordin
 dm == David Magda dma...@ee.ryerson.ca writes:

dm The other thing is that with the growth of SSDs, if more OS
dm vendors support dynamic sectors, SSD makers can have
dm different values for the sector size 

okay, but if the size of whatever you're talking about is a multiple
of 512, we don't actually need (or, probably, want!) any SCSI sector
size monkeying around.  Just establish a minimum write size in the
filesystem, and always write multiple aligned 512-sectors at once
instead.

the 520-byte sectors you mentioned can't be accomodated this way, but
for 4kByte it seems fine.

dm to allow for performance
dm changes as the technology evolves.  Currently everything is
dm hard-coded,

XFS is hardcoded.  NTFS has settable block size.  ZFS has ashift
(almost).  ZFS slog is apparently hardcoded though.  so, two of those
four are not hardcoded, and the two hardcoded ones are hardcoded to
4kByte.

dm Until you're in a virtualized environment. I believe that in
dm the combination of NetApp and VMware, a 64K alignment is best
dm practice last I head. Similarly with the various stripe widths
dm available on traditional RAID arrays, it could be advantageous
dm for the OS/FS to know it.

There is another setting in XFS for RAID stripe size, but I don't know
what it does.  It's separate from the (unsettable) XFS block size
setting.  so...this 64kByte thing might not be the same thing as what
we're talking about so far...though in terms of aligning partitions
it's the same, I guess.


pgpKhRGPwJZ8d.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ashift and vdevs

2010-12-01 Thread Roch

Brandon High writes:
  On Tue, Nov 23, 2010 at 9:55 AM, Krunal Desai mov...@gmail.com wrote:
   What is the upgrade path like from this? For example, currently I
  
  The ashift is set in the pool when it's created and will persist
  through the life of that pool. If you set it at pool creation, it will
  stay regardless of OS upgrades.
  

It is indeed persistent but each top level vdev (mirror or
raid-z group or drive in a stripe) will have it's own value
based on the sector size when the vdev was integrated in the
pool. The sector size of a vdev which is part of a pool
is better not to increase (or vdev will be faulted).

-r

  -B
  
  --
  Brandon High : bh...@freaks.com
  ___
  zfs-discuss mailing list
  zfs-discuss@opensolaris.org
  http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ashift and vdevs

2010-12-01 Thread Miles Nordin
 kd == Krunal Desai mov...@gmail.com writes:

kd http://support.microsoft.com/kb/whatever

dude.seriously?

This is worse than a waste of time.  Don't read a URL that starts this
way.

kd Windows 7 (even with SP1) has no support for 4K-sector
kd drives.

NTFS has 4KByte allocation units, so all you have to do is make sure
the NTFS partition starts at an LBA that's a multiple of 8, and you
have full performance.  Probably NTFS is the reason WD has chosen
4kByte.

Linux XFS is also locked at 4kByte sector size, because that's the VM
page size and XFS cannot use any other block size than the page size.
so, 4kByte is good (except for ZFS).

kd can you explicate further about these drives and their
kd emulation (or lack thereof), I'd appreciate it!

further explication: all drives will have the emulation, or else you
wouldn't be able to boot from them.  The world of peecees isn't as
clean as you imagine.

kd which 4K sector drives offer a jumper or other method to
kd completely disable any form of emulation and appear to the
kd host OS as a 4K-sector drive?

None that I know of.  It's probably simpler and less silly to leave
the emulation in place forever than start adding jumpers and modes and
more secret commands.

It doesn't matter what sector size the drive presents to the host OS
because you can get the same performance character by always writing
an aligned set of 8 sectors at once, which is what people are trying
to force ZFS to do by adding 3 to ashift.  Whether the number is
reported by some messy new invented SCSI command, input by the
operator, or derived by a mini-benchmark added to
format/fmthard/zpool/whatever-applies-the-label, this is done once for
the life of the disk, and after that happens whenever the OS needs
this number it's gotten by issuing READ on the label.  Day-to-day, the
drive doesn't need to report it.  Therefore, it is ``ability to
accomodate a minimum-aligned-write-size'' which badly people want
added to their operating systems, and no one sane really cares about
automatic electronic reporting of true sector size.

Unfortunately (but predictably) it sounds like if you 'zfs replace' a
512-byte drive with a 4096-byte drive you are screwed.  therefore even
people with 512-byte drives might want to set their ashift for
4096-byte drives right now.  This is another reason it's a waste of
time to worry about reporting/querying a drive's ``true'' sector size:
for a pool of redundant disks, the needed planning's more complicated
than query-report-obey.

Also did anyone ever clarify whether the slog has an ashift?  or is it
forced-512?  or derived from whatever vdev will eventually contain the
separately-logged data?  I would expect generalized immediate Caring
about that since no slogs except ACARD and DDRDrive will have 512-byte
sectors.


pgpdnTloWn49S.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ashift and vdevs

2010-12-01 Thread Neil Perrin

On 12/01/10 22:14, Miles Nordin wrote:

Also did anyone ever clarify whether the slog has an ashift?  or is it
forced-512?  or derived from whatever vdev will eventually contain the
separately-logged data?  I would expect generalized immediate Caring
about that since no slogs except ACARD and DDRDrive will have 512-byte
sectors.
  

The minimum slog write is

#define ZIL_MIN_BLKSZ 4096

and all writes are also rounded to multiples of ZIL_MIN_BLKSZ.

Neil.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ashift and vdevs

2010-11-26 Thread Krunal Desai
 I'd also note that in the future at some point, we won't be able to purchase 
 512B drives any more. In particular, I think that 3TB drives will all be 4KB 
 formatted. So it isn't inadvisable for a pool that you plan on expanding to 
 have ashift=12 (imo).

One new thought occurred to me; I know some of the 4K drives emulate 512 byte 
sectors, so to the host OS, they appear to be no different than other 512b 
drives. With this additional layer of emulation, I would assume that ashift 
wouldn't be needed, though I have read reports of this affecting performance. I 
think I'll need to confirm what drives do what exactly and then decide on an 
ashift if needed.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ashift and vdevs

2010-11-26 Thread taemun
On 27 November 2010 08:05, Krunal Desai mov...@gmail.com wrote:

 One new thought occurred to me; I know some of the 4K drives emulate 512
 byte sectors, so to the host OS, they appear to be no different than other
 512b drives. With this additional layer of emulation, I would assume that
 ashift wouldn't be needed, though I have read reports of this affecting
 performance. I think I'll need to confirm what drives do what exactly and
 then decide on an ashift if needed.


If you consider that for a 4KB internal drive, with a 512B external
interface, a request for a 512B write will result in the drive reading 4KB,
modifying it (putting the new 512B in) and then writing the 4KB out again.
This is terrible from a latency perspective. I recall seeing 20 IOPS on a WD
EARS 2TB drive (ie, 50ms latency for random 512B writes).
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ashift and vdevs

2010-11-26 Thread Krunal Desai
On Nov 26, 2010, at 20:09 , taemun wrote:
 If you consider that for a 4KB internal drive, with a 512B external 
 interface, a request for a 512B write will result in the drive reading 4KB, 
 modifying it (putting the new 512B in) and then writing the 4KB out again. 
 This is terrible from a latency perspective. I recall seeing 20 IOPS on a WD 
 EARS 2TB drive (ie, 50ms latency for random 512B writes).

Agreed. However, if you look at this MS KB article: 
http://support.microsoft.com/kb/982018/en-us , Windows 7 (even with SP1) has no 
support for 4K-sector drives. Obviously, we're dealing with ZFS and Solaris/BSD 
here, but what I'm getting at is, which 4K sector drives offer a jumper or 
other method to completely disable any form of emulation and appear to the host 
OS as a 4K-sector drive?

I believe the Barracuda LPs (the 5900rpm) disks can do this, but I'm not sure 
about the others like the F4s. I believe you earlier said you were using F4s 
(the HD204UIs) and the 5900rpm Seagates; can you explicate further about these 
drives and their emulation (or lack thereof), I'd appreciate it!

--khd
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ashift and vdevs

2010-11-26 Thread Brandon High
On Tue, Nov 23, 2010 at 9:55 AM, Krunal Desai mov...@gmail.com wrote:
 What is the upgrade path like from this? For example, currently I

The ashift is set in the pool when it's created and will persist
through the life of that pool. If you set it at pool creation, it will
stay regardless of OS upgrades.

-B

--
Brandon High : bh...@freaks.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ashift and vdevs

2010-11-23 Thread taemun
zdb -C shows an shift value on each vdev in my pool, I was just wondering if
it is vdev specific, or pool wide. Google didn't seem to know.

I'm considering a mixed pool with some advanced format (4KB sector)
drives, and some normal 512B sector drives, and was wondering if the ashift
can be set per vdev, or only per pool. Theoretically, this would save me
some size on metadata on the 512B sector drives.

Cheers,
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ashift and vdevs

2010-11-23 Thread David Magda
On Tue, November 23, 2010 08:53, taemun wrote:
 zdb -C shows an shift value on each vdev in my pool, I was just wondering
 if it is vdev specific, or pool wide. Google didn't seem to know.

 I'm considering a mixed pool with some advanced format (4KB sector)
 drives, and some normal 512B sector drives, and was wondering if the
 ashift can be set per vdev, or only per pool. Theoretically, this would
 save me some size on metadata on the 512B sector drives.

It's a per-pool property, and currently hard coded to a value of nine
(i.e., 2^9 = 512). Sun/Oracle are aware of the new, upcoming sector size/s
and some changes have been made in the code:

a. PSARC/2008/769: Multiple disk sector size support
http://arc.opensolaris.org/caselog/PSARC/2008/769/
b. PSARC/2010/296: Add tunable to control RMW for Flash Devices
http://arc.opensolaris.org/caselog/PSARC/2010/296/

(a) appears to have been fixed in snv_118 or so:

http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6710930

However, at this time, there is no publicly available code that
dynamically determines physical sector size and then adjusts ZFS pools
automatically. Even if there was, most disks don't support the necessary
ATA/SCSI command extensions to report on physical and logical sizes
differences. AFAIK, they all simply report 512 when asked.

If all of your disks will be 4K, you can hack together a solution to take
advantage of that fact:

http://tinyurl.com/25gmy7o
http://www.solarismen.de/archives/5-Solaris-and-the-new-4K-Sector-Disks-e.g.-WDxxEARS-Part-2.html


Hopefully it'll make it into at least Solaris 11, as during the lifetime
of that product there will be even more disks with that property. There's
also the fact that many LUNs from SANs also have alignment issues, though
they tend to be at 64K. (At least that's what VMware and NetApp best
practices state.)


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ashift and vdevs

2010-11-23 Thread Krunal Desai
Interesting, I didn't realize that Soracle was working on/had a
solution somewhat in place for 4K-drives. I wonder what will happen
first for me, Hitachi 7K2000s hitting a reasonable price, or
4K/variable-size sector support hiting so I can use Samsung F4s or
Barracuda LPs.

On Tue, Nov 23, 2010 at 9:40 AM, David Magda dma...@ee.ryerson.ca wrote:
 On Tue, November 23, 2010 08:53, taemun wrote:
 zdb -C shows an shift value on each vdev in my pool, I was just wondering
 if it is vdev specific, or pool wide. Google didn't seem to know.

 I'm considering a mixed pool with some advanced format (4KB sector)
 drives, and some normal 512B sector drives, and was wondering if the
 ashift can be set per vdev, or only per pool. Theoretically, this would
 save me some size on metadata on the 512B sector drives.

 It's a per-pool property, and currently hard coded to a value of nine
 (i.e., 2^9 = 512). Sun/Oracle are aware of the new, upcoming sector size/s
 and some changes have been made in the code:

 a. PSARC/2008/769: Multiple disk sector size support
        http://arc.opensolaris.org/caselog/PSARC/2008/769/
 b. PSARC/2010/296: Add tunable to control RMW for Flash Devices
        http://arc.opensolaris.org/caselog/PSARC/2010/296/

 (a) appears to have been fixed in snv_118 or so:

        http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6710930

 However, at this time, there is no publicly available code that
 dynamically determines physical sector size and then adjusts ZFS pools
 automatically. Even if there was, most disks don't support the necessary
 ATA/SCSI command extensions to report on physical and logical sizes
 differences. AFAIK, they all simply report 512 when asked.

 If all of your disks will be 4K, you can hack together a solution to take
 advantage of that fact:

 http://tinyurl.com/25gmy7o
 http://www.solarismen.de/archives/5-Solaris-and-the-new-4K-Sector-Disks-e.g.-WDxxEARS-Part-2.html


 Hopefully it'll make it into at least Solaris 11, as during the lifetime
 of that product there will be even more disks with that property. There's
 also the fact that many LUNs from SANs also have alignment issues, though
 they tend to be at 64K. (At least that's what VMware and NetApp best
 practices state.)


 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss




-- 
--khd
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ashift and vdevs

2010-11-23 Thread taemun
Cheers for the links David, but you'll note that I've commented on the blog
you linked (ie, was aware of it). The zpool-12 binary linked from
http://digitaldj.net/2010/11/03/zfs-zpool-v28-openindiana-b147-4k-drives-and-you/
worked
perfectly on my SX11 installation. (It threw some error on b134, so it
relies on some external code, to some extent.)

I'd note for those who are going to try, that that binary produces a pool of
as high a version as the system supports. I was surprised that it was higher
than the code for which it was compiled (ie, b147 = zpool v28).

I'm currently populating a pool with a 9-wide raidz vdev of Samsung HD204UI
2TB (5400rpm, 4KB sector) and a 9-wide raidz vdev of Seagate LP ST32000542AS
2TB (5900 rpm, 4KB sector) which was created with that binary, and haven't
seen any of the performance issues I've had in the past with WD EARS drives.

It would be lovely if Oracle could see fit to implementing correct detection
of these drives! Or, at the very least, an -o ashift=12 parameter in the
zpool create function.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ashift and vdevs

2010-11-23 Thread Krunal Desai
On Tue, Nov 23, 2010 at 9:59 AM, taemun tae...@gmail.com wrote:
 I'm currently populating a pool with a 9-wide raidz vdev of Samsung HD204UI
 2TB (5400rpm, 4KB sector) and a 9-wide raidz vdev of Seagate LP ST32000542AS
 2TB (5900 rpm, 4KB sector) which was created with that binary, and haven't
 seen any of the performance issues I've had in the past with WD EARS drives.
 It would be lovely if Oracle could see fit to implementing correct detection
 of these drives! Or, at the very least, an -o ashift=12 parameter in the
 zpool create function.

What is the upgrade path like from this? For example, currently I
have b134 OpenSolaris with 8x1.5TB drives in a -Z2 storage pool. I
would like to go to OpenIndiana and move that data to a new pool built
of 3 6-drive -Z2s (utilizing 2TB drives). I am going to stagger my
drive purchases to give my wallet a breather, so I would likely start
with 2 6-drive -Z2s at the beginning. If I was to use that
binary/hack to force the ashift for 4k drives, I should be able to
reconcile and upgrade to a zpool version down the road that is happy
and aware of 4k drives?

I know the safest route would be to just go with 512-byte sector
7K2000s, but their prices do not drop nearly as often as the LPs or
F4s do.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss