[zfs-discuss] Drive MTBF WAS: 350TB+ storage solution

2011-05-18 Thread Paul Kraus
On Mon, May 16, 2011 at 8:45 PM, Jim Klimov jimkli...@cos.ru wrote:

 If MTBFs were real, we'd never see disks failing within a year ;)

Remember that MTBF (and MTTR and MTTDL) are *statistics* and not
guarantees. If a type of drive has an MTBF of 10 years, then the MEAN
(average) time between failures for a _big_enough_sample_ set will be
10 years. Of course these failures will be both front and back end
loaded :-) I use MTBF as a *relative* measure of drives. A drive with
an MTBF of 10 years will *probably* survive twice as long as a drive
with an MTBF of 5 years.

 Problem is, these values seem to be determined in an ivory-tower
 lab. An expensive-vendor edition of a drive running in a cooled
 data center with shock absorbers and other nice features does
 often live a lot longer than a similar OEM enterprise or consumer
 drive running in an apartment with varying weather around and
 often overheating and randomly vibrating with a dozen other
 disks rotating in the same box.

Actually, I'll bet the values are calculated based on the MTBF of
the components of the drive. And those MTBF values are calculated or
estimated based on accelerated aging tests :-) So the final MTBF is
a guess based on multiple assumptions :-) You really can't measure
MTBF except after the fact.

 The ramble about expensive-vendor drive editions comes from
 my memory of some forum or blog discussion which I can't point
 to now either, which suggested that vendors like Sun do not
 charge 5x-10x the price of the same label of OEM drive just
 for a nice corporate logo stamped onto the disk.

The firmware on a Sun badged drive is *different* than the generic
version. I expect the same is true of IBM, HP , maybe (but probably
not) Dell. The Sun drives returns a Sun identified for to a SCSI
inquiry command. The Vendor ID and Product ID (VID and PID) are
different from a generic. Also, in the case of Sun, if Sun sells a
drive as a 72 GB, then no matter the manufacturer, the number of
blocks will match (although they have screwed that up on a couple
occasions), permitting any manufacturer's Sun 72 GB drive to swap for
any other.

 Vendors were
 said to burn-in the drives in their labs for like half a year or a
 year before putting the survivors to the market. This implies
 that some of the drives did not survive a burn-in period, and
 indeed the MTBF for the remaining ones is higher because
 infancy death due to manufacturing problems soon after
 arrival to the end customer is unlikely for these particular
 tested devices.

I have never heard this and my experience does not support it. I
_have_ seen infant mortality rates with Sun badged drives consistent
with the overall drive market.

 The long burn-in times were also said to
 be the partial reason why vendors never sell the biggest
 disks available on the market (does any vendor sell 3Tb
 with their own brand already? Sun-Oracle? IBM? HP?)
 Thus may be obscured as certification process which
 occasionally takes about as long - to see if the newest
 and greatest disks die within a year or so.

I suspect there are three reasons for the delay:
1) certification (let's make sure these new drives really work)
2) time to build the Sun firmware for the new drive
3) supply chain delays (Sun needs a new P/N and new lists of what
works with what)

I think the largest contributor to the price difference between a
Seagate drive with a Seagate badge and one with a Sun badge is the
profit inherent in additional layers or markup. Remember (at least
before Oracle), when you bought a drive from CDW or Newegg the profit
chain was:

1) Seagate
2) Newegg

But from Sun you were looking at:

1) Seagate (and Sun paid more here for their custom FW)
2) Sun
3) Master Reseller
4) Reseller

I think that even Sun direct accounts were shipped via a Master
Reseller. I don't think Sun ever maintained their own warehouse of
stuff (at least since 1995 when I first started dealing with them).

 Another implied idea in that discussion was that the vendors
 can influence OEMs in choice of components, an example
 in the thread being about different marks of steel for the
 ball bearings. Such choices can drive the price up with
 a reason - disks like that are more expensive to produce -
 but also increases their reliability.

Hurmmm, I would love to take a Seagate ES-2 series drive and a Sun
badged version of the same drive apart and see. (feel free to
substitute whatever the base Seagate model is for the Sun drive).

 In fact, I've had very few Sun disks breaking in the boxes
 I've managed over 10 years; all I can remember now were
 two or three 2.5 72Gb Fujitsus with a Sun brand. Still, we
 have another dozen of those running so far for several years.

I have seen the typical failure curve (ignoring the occasional bad
batch that has an 80% infant mortality rate), with about 4% - 5%
infant mortality in the first year (I just put 5 x J4400 with 120
drives on line in the past year and had 5 

[zfs-discuss] Solaris vs FreeBSD question

2011-05-18 Thread Paul Kraus
Over the past few months I have seen mention of FreeBSD a couple
time in regards to ZFS. My question is how stable (reliable) is ZFS on
this platform ?

This is for a home server and the reason I am asking is that about
a year ago I bought some hardware based on it's inclusion on the
Solaris 10 HCL, as follows:

SuperMicro 7045A-WTB (although I would have preferred the server
version, but it wasn't on the HCL)
Two quad core 2.0 GHz Xeon CPUs
8 GB RAM (I am NOT planning on using DeDupe)
2 x Seagate ES-2 250 GB SATA drives for the OS
4 x Seagate ES-2 1 TB SATA drives for data
Nvidia Geforce 8400 (cheapest video card I could get locally)

I could not get the current production Solaris or OpenSolaris to
load. The miniroot would GPF while loading the kernel. I could not get
the problem resolved and needed to get the server up and running as my
old server was dying (dual 550 MHz P3 with 1 GB RAM) and I needed to
get my data (about 600 GB) off of it before I lost anything. That old
server was running Solaris 10 and the data was in a zpool with
mirrored vdevs of different sized drives. I had lost one drive in each
vdev and zfs saved my data. So I loaded OpenSuSE and moved the data to
a mirrored pair of 1 TB drives.

I still want to move my data to ZFS, and push has come to shove,
as I am about to overflow the 1 TB mirror and I really, really hate
the Linux options for multiple disk device management (I'm spoiled by
SVM and ZFS). So now I really need to get that hardware loaded with an
OS that supports ZFS. I have tried every variation of Solaris that I
can get my hands on including Solaris 11 Express and Nexenta 3 and
they all GPF loading the kernel to run the installer. My last hope is
that I have a very plain vanilla (ancient S540) video card to swap in
for the Nvidia on the very long shot chance that is the problem. But I
need a backup plan if that does not work.

I have tested the hardware with FreeBSD 8 and it boots to the
installer. So my question is whether the FreeBSD ZFS port is up to
production use ? Is there anyone here using FreeBSD in production with
good results (this list tends to only hear about serious problems and
not success stories) ?

P.S. If anyone here has a suggestion as to how to get Solaris to load
I would love to hear it. I even tried disabling multi-cores (which
makes the CPUs look like dual core instead of quad) with no change. I
have not been able to get serial console redirect to work so I do not
have a good log of the failures.

-- 
{1-2-3-4-5-6-7-}
Paul Kraus
- Senior Systems Architect, Garnet River ( http://www.garnetriver.com/ )
- Sound Coordinator, Schenectady Light Opera Company (
http://www.sloctheater.org/ )
- Technical Advisor, RPI Players
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Solaris vs FreeBSD question

2011-05-18 Thread Tim Cook
On Wed, May 18, 2011 at 7:47 AM, Paul Kraus p...@kraus-haus.org wrote:

Over the past few months I have seen mention of FreeBSD a couple
 time in regards to ZFS. My question is how stable (reliable) is ZFS on
 this platform ?

This is for a home server and the reason I am asking is that about
 a year ago I bought some hardware based on it's inclusion on the
 Solaris 10 HCL, as follows:

 SuperMicro 7045A-WTB (although I would have preferred the server
 version, but it wasn't on the HCL)
 Two quad core 2.0 GHz Xeon CPUs
 8 GB RAM (I am NOT planning on using DeDupe)
 2 x Seagate ES-2 250 GB SATA drives for the OS
 4 x Seagate ES-2 1 TB SATA drives for data
 Nvidia Geforce 8400 (cheapest video card I could get locally)

I could not get the current production Solaris or OpenSolaris to
 load. The miniroot would GPF while loading the kernel. I could not get
 the problem resolved and needed to get the server up and running as my
 old server was dying (dual 550 MHz P3 with 1 GB RAM) and I needed to
 get my data (about 600 GB) off of it before I lost anything. That old
 server was running Solaris 10 and the data was in a zpool with
 mirrored vdevs of different sized drives. I had lost one drive in each
 vdev and zfs saved my data. So I loaded OpenSuSE and moved the data to
 a mirrored pair of 1 TB drives.

I still want to move my data to ZFS, and push has come to shove,
 as I am about to overflow the 1 TB mirror and I really, really hate
 the Linux options for multiple disk device management (I'm spoiled by
 SVM and ZFS). So now I really need to get that hardware loaded with an
 OS that supports ZFS. I have tried every variation of Solaris that I
 can get my hands on including Solaris 11 Express and Nexenta 3 and
 they all GPF loading the kernel to run the installer. My last hope is
 that I have a very plain vanilla (ancient S540) video card to swap in
 for the Nvidia on the very long shot chance that is the problem. But I
 need a backup plan if that does not work.

I have tested the hardware with FreeBSD 8 and it boots to the
 installer. So my question is whether the FreeBSD ZFS port is up to
 production use ? Is there anyone here using FreeBSD in production with
 good results (this list tends to only hear about serious problems and
 not success stories) ?

 P.S. If anyone here has a suggestion as to how to get Solaris to load
 I would love to hear it. I even tried disabling multi-cores (which
 makes the CPUs look like dual core instead of quad) with no change. I
 have not been able to get serial console redirect to work so I do not
 have a good log of the failures.

 --

 {1-2-3-4-5-6-7-}
 Paul Kraus
 - Senior Systems Architect, Garnet River ( http://www.garnetriver.com/ )
 - Sound Coordinator, Schenectady Light Opera Company (
 http://www.sloctheater.org/ )
 - Technical Advisor, RPI Players



I've heard nothing but good things about it.  FreeNAS uses it:
http://freenas.org/ and IXSystems sells a commercial product based on the
FreeNAS/FreeBSD code.  I don't think they have a full-blown implementation
of CIFS (just Samba), but other than that, I don't think you'll have too
many issues.  I actually considered moving over to it, but I made the
unfortunate mistake of upgrading to Solaris 11 Express, which means my zpool
version is now too new to run anything else (AFAIK).

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Solaris vs FreeBSD question

2011-05-18 Thread Bob Friesenhahn

On Wed, 18 May 2011, Paul Kraus wrote:


   Over the past few months I have seen mention of FreeBSD a couple
time in regards to ZFS. My question is how stable (reliable) is ZFS on
this platform ?


This would be a very excellent question to ask on the related FreeBSD 
mailing list (freebsd...@freebsd.org).



   I have tested the hardware with FreeBSD 8 and it boots to the
installer. So my question is whether the FreeBSD ZFS port is up to
production use ? Is there anyone here using FreeBSD in production with
good results (this list tends to only hear about serious problems and
not success stories) ?


I have been on the freebsd-fs mailing list for quite some time now and 
it does seem that there are quite a few happy FreeBSD zfs users. 
There are also some users who experience issues.  FreeBSD zfs may 
require more tuning (depending on your hardware) than Solaris zfs for 
best performance.


If you are very careful, you can create a zfs pool which can be used 
by FreeBSD or Solaris.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Solaris vs FreeBSD question

2011-05-18 Thread a . smith

Hi,

  I am using FreeBSD 8.2 in production with ZFS. Although I have had  
one issue with it in the past but I would recommend it and I consider  
it production ready. That said if you can wait for FreeBSD 8.3 or 9.0  
to come out (a few months away) you will get a better system as these  
will include ZFS v28 (FreeBSD-RELEASE is currently v15).
On the other had things can always go wrong, of course RAID is not  
backup, even with snapshots ;)


cheers Andy.



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Solaris vs FreeBSD question

2011-05-18 Thread Freddie Cash
On Wed, May 18, 2011 at 5:47 AM, Paul Kraus p...@kraus-haus.org wrote:
    Over the past few months I have seen mention of FreeBSD a couple
 time in regards to ZFS. My question is how stable (reliable) is ZFS on
 this platform ?

ZFSv15, as shipped with FreeBSD 8.3, is rock stable in our uses.  We
have two servers running without any issues.  These are our backups
servers, doing rsync backups every night for ~130 remote Linux and
FreeBSD systems.  These are 5U rackmount boxes with:
  - Chenbro 5U storage chassis with 3-way redundant PSUs
  - Tyan h2000M motherboard
  - 2x AMD Opteron 2000-series CPUs (dual-core)
  - 8 GB ECC DDR2-SDRAM
  - 2x 8 GB CompactFlash (mirrored for OS install)
  - 2x 3Ware RAID controllers (12-port multi-lane)
  - 24x SATA harddrives (various sizes, configured in 3x 8-drive raidz2 vdevs)
  - FreeBSD 8.3 on both servers

ZFSv28, as shipped in FreeBSD -CURRENT (the development version that
will eventually become 9.0), is a little rough around the edges, but
is getting better over time.  There are also patches floating around
that allow you to use ZFSv28 with 8-STABLE (the development version
that will eventually become 8.4).  These are a little rougher around
the edges.

We have only been testing ZFS in storage servers for backups, but have
plans to start testing it NFS servers with an eye toward creating
NAS/SAN setups for virtual machines.

I also run it on my home media server, which is nowhere near server
quality, without issues:
  - generic Intel motherboard
  - 2.8 GHz P4 CPU
  - 3 SATA1 harddrives connected to motherboard, in a raidz1 vdev
  - 2 IDE harddrives connected to a Promise PCI controller, in a mirror vdev
  - 2 GB non-ECC SDRAM
  - 2 GB USB stick for the OS install
  - FreeBSD 8.2

-- 
Freddie Cash
fjwc...@gmail.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] 350TB+ storage solution

2011-05-18 Thread Chris Mosetick
The drives I just bought were half packed in white foam then wrapped

 in bubble wrap.  Not all edges were protected with more than bubble
 wrap.



Same here for me. I purchased 10 x 2TB Hitachi 7200rpm SATA disks from
Newegg.com in March. The majority of the drives were protected in white
foam. However ~1/2 inch of each end of all the drives were only protected by
bubble wrap. A small batch of three disks I ordered (testing for the larger
order) in February were packed similarly, and I've already had to RMA one of
those drives. Newegg, moving in the right direction, but still have a ways
to go in the packing dept. I still love their prices!

-Chris
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] 350TB+ storage solution

2011-05-18 Thread Rich Teer
On Wed, 18 May 2011, Chris Mosetick wrote:

 to go in the packing dept. I still love their prices!

There's a reason fort at: you don't get what you don't pay for!

-- 
Rich Teer, Publisher
Vinylphile Magazine

www.vinylphilemag.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Solaris vs FreeBSD question

2011-05-18 Thread Brandon High
On Wed, May 18, 2011 at 5:47 AM, Paul Kraus p...@kraus-haus.org wrote:
 P.S. If anyone here has a suggestion as to how to get Solaris to load
 I would love to hear it. I even tried disabling multi-cores (which
 makes the CPUs look like dual core instead of quad) with no change. I
 have not been able to get serial console redirect to work so I do not
 have a good log of the failures.

Have you checked your system in the HCL device tool at
http://www.sun.com/bigadmin/hcl/hcts/device_detect.jsp ? It should be
able to tell you which device is causing the problem. If I remember
correctly, you can feed it the output of 'lspci -vv -n'.

You may have to disable some on-board devices to get through the
installer, but I couldn't begin to guess which.

-B

-- 
Brandon High : bh...@freaks.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Solaris vs FreeBSD question

2011-05-18 Thread Garrett D'Amore
We might have a better change of diagnosing your problem if we had a copy of 
your panic message buffer.  Have you considered OpenIndiana and illumos as an 
option, or even NexentaStor if you are just looking for a storage appliance 
(though my guess is that you need more general purpose compute capabilities)? 

  -- Garrett D'Amore

On May 18, 2011, at 2:48 PM, Paul Kraus p...@kraus-haus.org wrote:

 
Over the past few months I have seen mention of FreeBSD a couple
 time in regards to ZFS. My question is how stable (reliable) is ZFS on
 this platform ?
 
This is for a home server and the reason I am asking is that about
 a year ago I bought some hardware based on it's inclusion on the
 Solaris 10 HCL, as follows:
 
 SuperMicro 7045A-WTB (although I would have preferred the server
 version, but it wasn't on the HCL)
 Two quad core 2.0 GHz Xeon CPUs
 8 GB RAM (I am NOT planning on using DeDupe)
 2 x Seagate ES-2 250 GB SATA drives for the OS
 4 x Seagate ES-2 1 TB SATA drives for data
 Nvidia Geforce 8400 (cheapest video card I could get locally)
 
I could not get the current production Solaris or OpenSolaris to
 load. The miniroot would GPF while loading the kernel. I could not get
 the problem resolved and needed to get the server up and running as my
 old server was dying (dual 550 MHz P3 with 1 GB RAM) and I needed to
 get my data (about 600 GB) off of it before I lost anything. That old
 server was running Solaris 10 and the data was in a zpool with
 mirrored vdevs of different sized drives. I had lost one drive in each
 vdev and zfs saved my data. So I loaded OpenSuSE and moved the data to
 a mirrored pair of 1 TB drives.
 
I still want to move my data to ZFS, and push has come to shove,
 as I am about to overflow the 1 TB mirror and I really, really hate
 the Linux options for multiple disk device management (I'm spoiled by
 SVM and ZFS). So now I really need to get that hardware loaded with an
 OS that supports ZFS. I have tried every variation of Solaris that I
 can get my hands on including Solaris 11 Express and Nexenta 3 and
 they all GPF loading the kernel to run the installer. My last hope is
 that I have a very plain vanilla (ancient S540) video card to swap in
 for the Nvidia on the very long shot chance that is the problem. But I
 need a backup plan if that does not work.
 
I have tested the hardware with FreeBSD 8 and it boots to the
 installer. So my question is whether the FreeBSD ZFS port is up to
 production use ? Is there anyone here using FreeBSD in production with
 good results (this list tends to only hear about serious problems and
 not success stories) ?
 
 P.S. If anyone here has a suggestion as to how to get Solaris to load
 I would love to hear it. I even tried disabling multi-cores (which
 makes the CPUs look like dual core instead of quad) with no change. I
 have not been able to get serial console redirect to work so I do not
 have a good log of the failures.
 
 -- 
 {1-2-3-4-5-6-7-}
 Paul Kraus
 - Senior Systems Architect, Garnet River ( http://www.garnetriver.com/ )
 - Sound Coordinator, Schenectady Light Opera Company (
 http://www.sloctheater.org/ )
 - Technical Advisor, RPI Players
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Extremely slow zpool scrub performance

2011-05-18 Thread Donald Stahl
Wow- so a bit of an update:

With the default scrub delay:
echo zfs_scrub_delay/K | mdb -kw
zfs_scrub_delay:20004

pool0   14.1T  25.3T165499  1.28M  2.88M
pool0   14.1T  25.3T146  0  1.13M  0
pool0   14.1T  25.3T147  0  1.14M  0
pool0   14.1T  25.3T145  3  1.14M  31.9K
pool0   14.1T  25.3T314  0  2.43M  0
pool0   14.1T  25.3T177  0  1.37M  3.99K

The scrub continues on at about 250K/s - 500K/s

With the delay set to 1:

echo zfs_scrub_delay/W1 | mdb -kw

pool0   14.1T  25.3T272  3  2.11M  31.9K
pool0   14.1T  25.3T180  0  1.39M  0
pool0   14.1T  25.3T150  0  1.16M  0
pool0   14.1T  25.3T248  3  1.93M  31.9K
pool0   14.1T  25.3T223  0  1.73M  0

The pool scrub rate climbs to about 800K/s - 100K/s

If I set the delay to 0:

echo zfs_scrub_delay/W0 | mdb -kw

pool0   14.1T  25.3T  50.1K116   392M   434K
pool0   14.1T  25.3T  49.6K  0   389M  0
pool0   14.1T  25.3T  50.8K 61   399M   633K
pool0   14.1T  25.3T  51.2K  3   402M  31.8K
pool0   14.1T  25.3T  51.6K  0   405M  3.98K
pool0   14.1T  25.3T  52.0K  0   408M  0

Now the pool scrub rate climbs to 100MB/s (in the brief time I looked at it).

Is there a setting somewhere between slow and ludicrous speed?

-Don
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Extremely slow zpool scrub performance

2011-05-18 Thread George Wilson
Don,

Try setting the zfs_scrub_delay to 1 but increase the
zfs_top_maxinflight to something like 64.

Thanks,
George

On Wed, May 18, 2011 at 5:48 PM, Donald Stahl d...@blacksun.org wrote:
 Wow- so a bit of an update:

 With the default scrub delay:
 echo zfs_scrub_delay/K | mdb -kw
 zfs_scrub_delay:20004

 pool0       14.1T  25.3T    165    499  1.28M  2.88M
 pool0       14.1T  25.3T    146      0  1.13M      0
 pool0       14.1T  25.3T    147      0  1.14M      0
 pool0       14.1T  25.3T    145      3  1.14M  31.9K
 pool0       14.1T  25.3T    314      0  2.43M      0
 pool0       14.1T  25.3T    177      0  1.37M  3.99K

 The scrub continues on at about 250K/s - 500K/s

 With the delay set to 1:

 echo zfs_scrub_delay/W1 | mdb -kw

 pool0       14.1T  25.3T    272      3  2.11M  31.9K
 pool0       14.1T  25.3T    180      0  1.39M      0
 pool0       14.1T  25.3T    150      0  1.16M      0
 pool0       14.1T  25.3T    248      3  1.93M  31.9K
 pool0       14.1T  25.3T    223      0  1.73M      0

 The pool scrub rate climbs to about 800K/s - 100K/s

 If I set the delay to 0:

 echo zfs_scrub_delay/W0 | mdb -kw

 pool0       14.1T  25.3T  50.1K    116   392M   434K
 pool0       14.1T  25.3T  49.6K      0   389M      0
 pool0       14.1T  25.3T  50.8K     61   399M   633K
 pool0       14.1T  25.3T  51.2K      3   402M  31.8K
 pool0       14.1T  25.3T  51.6K      0   405M  3.98K
 pool0       14.1T  25.3T  52.0K      0   408M      0

 Now the pool scrub rate climbs to 100MB/s (in the brief time I looked at it).

 Is there a setting somewhere between slow and ludicrous speed?

 -Don




-- 
George Wilson



M: +1.770.853.8523
F: +1.650.494.1676
275 Middlefield Road, Suite 50
Menlo Park, CA 94025
http://www.delphix.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Summary: Dedup and L2ARC memory requirements

2011-05-18 Thread Edward Ned Harvey
 From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
 boun...@opensolaris.org] On Behalf Of Edward Ned Harvey
 
 New problem:
 
 I'm following all the advice I summarized into the OP of this thread, and
 testing on a test system.  (A laptop).  And it's just not working.  I am
 jumping into the dedup performance abyss far, far eariler than
predicted...

Now I'm repeating all these tests on a system that more closely resembles a
server.  This is a workstation with 6 core processor, 16G ram, and a single
1TB hard disk.

In the default configuration, arc_meta_limit is 3837MB.  And as I increase
the number of unique blocks in the data pool, it is perfectly clear that
performance jumps off a cliff when arc_meta_used starts to reach that level,
which is approx 880,000 to 1,030,000 unique blocks.  FWIW, this means,
without evil tuning, a 16G server is only sufficient to run dedup on approx
33GB to 125GB unique data without severe performance degradation.  I'm
calling severe degradation anything that's an order of magnitude or worse.
(That's 40K average block size * 880,000 unique blocks, and 128K average
block size * 1,030,000 unique blocks.)

So clearly this needs to be addressed, if dedup is going to be super-awesome
moving forward.

But I didn't quit there.

So then I tweak the arc_meta_limit.  Set to 7680MB.  And repeat the test.
This time, the edge of the cliff is not so clearly defined, something like
1,480,000 to 1,620,000 blocks.  But the problem is - arc_meta_used never
even comes close to 7680MB.  At all times, I still have at LEAST 2G unused
free mem.

I have 16G physical mem, but at all times, I always have at least 2G free.
my arcstats:c_max is 15G.  But my arc size never exceeds 8.7G
my arc_meta_limit is 7680 MB, but my arc_meta_used never exceeds 3647 MB.

So what's the holdup?

All of the above is, of course, just a summary.  If you want complete
overwhelming details, here they are:
http://dl.dropbox.com/u/543241/dedup%20tests/readme.txt

http://dl.dropbox.com/u/543241/dedup%20tests/datagenerate.c
http://dl.dropbox.com/u/543241/dedup%20tests/getmemstats.sh
http://dl.dropbox.com/u/543241/dedup%20tests/parse.py
http://dl.dropbox.com/u/543241/dedup%20tests/runtest.sh

http://dl.dropbox.com/u/543241/dedup%20tests/work%20workstation/runtest-outp
ut-1st-pass.txt
http://dl.dropbox.com/u/543241/dedup%20tests/work%20workstation/runtest-outp
ut-1st-pass-parsed.xlsx

http://dl.dropbox.com/u/543241/dedup%20tests/work%20workstation/runtest-outp
ut-2nd-pass.txt
http://dl.dropbox.com/u/543241/dedup%20tests/work%20workstation/runtest-outp
ut-2nd-pass-parsed.xlsx


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Extremely slow zpool scrub performance

2011-05-18 Thread Donald Stahl
 Try setting the zfs_scrub_delay to 1 but increase the
 zfs_top_maxinflight to something like 64.
The array is running some regression tests right now but when it
quiets down I'll try that change.

-Don
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Extremely slow zpool scrub performance

2011-05-18 Thread Donald Stahl
 Try setting the zfs_scrub_delay to 1 but increase the
 zfs_top_maxinflight to something like 64.
With the delay set to 1 or higher it doesn't matter what I set the
maxinflight value to- when I check with:

echo ::walk spa | ::print spa_t spa_name spa_last_io spa_scrub_inflight

The value returned is only ever 0, 1 or 2.

If I set the delay to zero, but drop the maxinflight to 8, then the
read rate drops from 400MB/s to 125MB/s.

If I drop it again to 4- then the read rate drops to a much more
manageable 75MB/s.

The delay seems to be useless on this array- but the maxinflight makes
a big difference.

At 16 my read rate is 300. At 32 it goes up to 380. Beyond 32 it
doesn't seem to change much- it seems to level out at about 400 and
50k R/s:

pool0   14.1T  25.3T  51.2K  4   402M  35.8K
pool0   14.1T  25.3T  51.9K  3   407M  31.8K
pool0   14.1T  25.3T  52.1K  0   409M  0
pool0   14.1T  25.3T  51.9K  2   407M   103K
pool0   14.1T  25.3T  51.7K  3   406M  31.9K

I'm going to leave it at 32 for the night- as that is a quiet time for us.

In fact I will probably leave it at 32 all the time. Since our array
is very quiet on the weekends I can start a scan on Friday night and
be done long before Monday morning rolls around. For us that's
actually much more useful than having the scrub throttled at all
times, but taking a month to finish.

Thanks for the suggestions.

-Don
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss