Re: [zfs-discuss] zfs on SunFire X2100M2 with hybrid pools

2012-11-29 Thread Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)
 From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
 boun...@opensolaris.org] On Behalf Of Jim Klimov
 
 this is
 the part I am not certain about - it is roughly as cheap to READ the
 gzip-9 datasets as it is to read lzjb (in terms of CPU decompression).

Nope.  I know LZJB is not LZO, but I'm starting from a point of saying that LZO 
is specifically designed to be super-fast, low-memory for decompression.  (As 
claimed all over the LZO webpage, as well as wikipedia, and supported by my own 
personal experience using lzop).

So for comparison to LZJB, see here:
http://denisy.dyndns.org/lzo_vs_lzjb/

LZJB is, at least according to these guys, even faster than LZO.  So I'm 
confident concluding that lzjb (default) decompression is significantly faster 
than zlib (gzip) decompression.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs on SunFire X2100M2 with hybrid pools

2012-11-28 Thread Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)
 From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
 boun...@opensolaris.org] On Behalf Of Jim Klimov
 
 I really hope someone better versed in compression - like Saso -
 would chime in to say whether gzip-9 vs. lzjb (or lz4) sucks in
 terms of read-speeds from the pools. My HDD-based assumption is
 in general that the less data you read (or write) on platters -
 the better, and the spare CPU cycles can usually take the hit.

Oh, I can definitely field that one - 
The lzjb compression (default compression as long as you just turn compression 
on without specifying any other detail) is very fast compression, similar to 
lzo.  It generally has no noticeable CPU overhead, but it saves you a lot of 
time and space for highly repetitive things like text files (source code) and 
sparse zero-filled files and stuff like that.  I personally always enable this. 
 compresson=on

zlib (gzip) is more powerful, but *way* slower.  Even the fastest level gzip-1 
uses enough CPU cycles that you probably will be CPU limited rather than IO 
limited.  There are very few situations where this option is better than the 
default lzjb.

Some data (anything that's already compressed, zip, gz, etc, video files, 
jpg's, encrypted files, etc) are totally uncompressible with these algorithms.  
If this is the type of data you store, you should not use compression.

Probably not worth mention, but what the heck.  If you normally have 
uncompressible data and then one day you're going to do a lot of stuff that's 
compressible...  (Or vice versa)...  The compression flag is only used during 
writes.  Once it's written to the pool, compressed or uncompressed, it stays 
that way, even if you change the flag later.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs on SunFire X2100M2 with hybrid pools

2012-11-28 Thread Ian Collins

Edward Ned Harvey (opensolarisisdeadlongliveopensolaris) wrote:

From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
boun...@opensolaris.org] On Behalf Of Jim Klimov

I really hope someone better versed in compression - like Saso -
would chime in to say whether gzip-9 vs. lzjb (or lz4) sucks in
terms of read-speeds from the pools. My HDD-based assumption is
in general that the less data you read (or write) on platters -
the better, and the spare CPU cycles can usually take the hit.

Oh, I can definitely field that one -
The lzjb compression (default compression as long as you just turn compression on without 
specifying any other detail) is very fast compression, similar to lzo.  It generally has 
no noticeable CPU overhead, but it saves you a lot of time and space for highly 
repetitive things like text files (source code) and sparse zero-filled files and stuff 
like that.  I personally always enable this.  compresson=on

zlib (gzip) is more powerful, but *way* slower.  Even the fastest level gzip-1 
uses enough CPU cycles that you probably will be CPU limited rather than IO 
limited.


I haven't seen that for a long time.  When gzip compression was first 
introduced, it would cause writes on a Thumper to be CPU bound.  It was 
all but unusable on that machine.  Today with better threading, I barely 
notice the overhead on the same box.



There are very few situations where this option is better than the default lzjb.


That part I do agree with!

--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs on SunFire X2100M2 with hybrid pools

2012-11-28 Thread Jim Klimov

Edward Ned Harvey (opensolarisisdeadlongliveopensolaris) wrote:

There are very few situations where (gzip) option is better than the
default lzjb.


Well, for the most part my question regarded the slowness (or lack of)
gzip DEcompression as compared to lz* algorithms. If there are files
and data like the OS (LZ/GZ) image and program binaries, which are
written once but read many times, I don't really care how expensive
it is to write less data (and for an OI installation the difference
between lzjb and gzip-9 compression of /usr can be around or over
100Mb's) - as long as I keep less data on-disk and have less IOs to
read in the OS during boot and work. Especially so, if - and this is
the part I am not certain about - it is roughly as cheap to READ the
gzip-9 datasets as it is to read lzjb (in terms of CPU decompression).
//Jim











___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs on SunFire X2100M2 with hybrid pools

2012-11-27 Thread Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)
 From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
 boun...@opensolaris.org] On Behalf Of Eugen Leitl
 
 can I make e.g. LSI SAS3442E
 directly do SSD caching (it says something about CacheCade,
 but I'm not sure it's an OS-side driver thing), as it
 is supposed to boost IOPS? Unlikely shot, but probably
 somebody here would know.

Depending on the type of work you will be doing, the best performance thing you 
could do is to disable zil (zfs set sync=disabled) and use SSD's for cache.  
But don't go *crazy* adding SSD's for cache, because they still have some 
in-memory footprint.  If you have 8G of ram and 80G SSD's, maybe just use one 
of them for cache, and let the other 3 do absolutely nothing.  Better yet, make 
your OS on a pair of SSD mirror, then use pair of HDD mirror for storagepool, 
and one SSD for cache.  Then you have one SSD unused, which you could 
optionally add as dedicated log device to your storagepool.  There are specific 
situations where it's ok or not ok to disable zil - look around and ask here if 
you have any confusion about it.  

Don't do redundancy in hardware.  Let ZFS handle it.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs on SunFire X2100M2 with hybrid pools

2012-11-27 Thread Eugen Leitl
On Tue, Nov 27, 2012 at 12:12:43PM +, Edward Ned Harvey 
(opensolarisisdeadlongliveopensolaris) wrote:
  From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
  boun...@opensolaris.org] On Behalf Of Eugen Leitl
  
  can I make e.g. LSI SAS3442E
  directly do SSD caching (it says something about CacheCade,
  but I'm not sure it's an OS-side driver thing), as it
  is supposed to boost IOPS? Unlikely shot, but probably
  somebody here would know.
 
 Depending on the type of work you will be doing, the best performance thing 
 you could do is to disable zil (zfs set sync=disabled) and use SSD's for 
 cache.  But don't go *crazy* adding SSD's for cache, because they still have 
 some in-memory footprint.  If you have 8G of ram and 80G SSD's, maybe just 
 use one of them for cache, and let the other 3 do absolutely nothing.  Better 
 yet, make your OS on a pair of SSD mirror, then use pair of HDD mirror for 
 storagepool, and one SSD for cache.  Then you have one SSD unused, which you 
 could optionally add as dedicated log device to your storagepool.  There are 
 specific situations where it's ok or not ok to disable zil - look around and 
 ask here if you have any confusion about it.  
 
 Don't do redundancy in hardware.  Let ZFS handle it.

Thanks. I'll try doing that, and see how it works out.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs on SunFire X2100M2 with hybrid pools

2012-11-27 Thread Jim Klimov

Performance-wise, I think you should go for mirrors/raid10, and
separate the pools (i.e. rpool mirror on SSD and data mirror on
HDDs). If you have 4 SSDs, you might mirror the other couple for
zoneroots or some databases in datasets delegated into zones,
for example. Don't use dedup. Carve out some space for L2ARC.
As Ed noted, you might not want to dedicate much disk space due
to remaining RAM pressure when using the cache; however, spreading
the IO load between smaller cache partitions/slices on each SSD
may help your IOPS on average. Maybe go for compression.

I really hope someone better versed in compression - like Saso -
would chime in to say whether gzip-9 vs. lzjb (or lz4) sucks in
terms of read-speeds from the pools. My HDD-based assumption is
in general that the less data you read (or write) on platters -
the better, and the spare CPU cycles can usually take the hit.

I'd spread out the different data types (i.e. WORM programs,
WORM-append logs and random-io other application data) into
various datasets with different settings, backed by different
storage - since you have the luxury.

Many best practice documents (and original Sol10/SXCE/LiveUpgrade
requirements) place the zoneroots on the same rpool so they can
be upgraded seamlessly as part of the OS image. However you can
also delegate ZFS datasets into zones and/or have lofs mounts
from GZ to LZ (maybe needed for shared datasets like distros
and homes - and faster/more robust than NFS from GZ to LZ).
For OS images (zoneroots) I'd use gzip-9 or better (likely lz4
when it gets integrated), same for logfile datasets, and lzjb,
zle or none for the random-io datasets. For structured things
like databases I also research the block IO size and use that
(at dataset creation time) to reduce extra work with ZFS COW
during writes - at expense of more metadata.

You'll likely benefit from having OS images on SSDs, logs on
HDDs (including logs from the GZ and LZ OSes, to reduce needless
writes on the SSDs), and databases on SSDs. Things depend for
other data types, and in general would be helped by L2ARC on
the SSDs.

Also note that much of the default OS image is not really used
(i.e. X11 on headless boxes), so you might want to do weird
things with GZ or LZ rootfs data layouts - note that these might
puzzle your beadm/liveupgrade software, so you'll have to do
any upgrades with lots of manual labor :)

On a somewhat orthogonal route, I'd start with setting up a
generic dummy zone, perhaps with much unneeded software,
and zfs-cloning that to spawn application zones. This way
you only pay the footprint price once, at least until you
have to upgrade the LZ OSes - in that case it might be cheaper
(in terms of storage at least) to upgrade the dummy, clone it
again, and port the LZ's customizations (installed software)
by finding the differences between the old dummy and current
zone state (zfs diff, rsync -cn, etc.) In such upgrades you're
really well served by storing volatile data in separate datasets
from the zone OS root - you just reattach these datasets to the
upgraded OS image and go on serving.

As a particular example of the thing often upgraded and taking
considerable disk space per copy - I'd have the current JDK
installed in GZ: either simply lofs-mounted from GZ to LZs,
or in a separate dataset, cloned and delegated into LZs (if
JDK customizations are further needed by some - but not all -
local zones, i.e. timezone updates, trusted CA certs, etc.).

HTH,
//Jim Klimov
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs on SunFire X2100M2 with hybrid pools

2012-11-27 Thread Jim Klimov

Now that I thought of it some more, a follow-up is due on my advices:

1) While the best practices do(did) dictate to set up zoneroots in
   rpool, this is certainly not required - and I maintain lots of
   systems which store zones in separate data pools. This minimizes
   write-impact on rpools and gives the fuzzy feeling of keeping
   the systems safer from unmountable or overfilled roots.

2) Whether LZs and GZs are in the same rpool for you, or you stack
   tens of your LZ roots in a separate pool, they do in fact offer
   a nice target for dedup - with expected large dedup ratio which
   would outweigh both the overheads and IO lags (especially if it
   is on SSD pool) and the inconveniences of my approach with cloned
   dummy zones - especially upgrades thereof. Just remember to use
   the same compression settings (or lack of compression) on all
   zoneroots, so that the zfs blocks for OS image files would be
the same and dedupable.

HTH,
//Jim Klimov
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs on SunFire X2100M2 with hybrid pools

2012-11-27 Thread Fajar A. Nugraha
On Tue, Nov 27, 2012 at 5:13 AM, Eugen Leitl eu...@leitl.org wrote:
 Now there are multiple configurations for this.
 Some using Linux (roof fs on a RAID10, /home on
 RAID 1) or zfs. Now zfs on Linux probably wouldn't
 do hybrid zfs pools (would it?)

Sure it does. You can even use the whole disk as zfs, with no
additional partition required (not even for /boot).

 and it wouldn't
 be probably stable enough for production. Right?

Depends on how you define stable, and what kind of in-house
expertise you have.

Some companies are selling (or plan to sell, as their product is in
open beta stage) storage appliances powered by zfs on linux (search
the ZoL list for details). So it's definitely stable-enough for them.

-- 
Fajar
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] zfs on SunFire X2100M2 with hybrid pools

2012-11-26 Thread Eugen Leitl

Dear internets,

I've got an old SunFire X2100M2 with 6-8 GBytes ECC RAM, which
I wanted to put into use with Linux, using the Linux
VServer patch (an analogon to zones), and 2x 2 TByte 
nearline (WD RE4) drives. It occured to me that the
1U case had enough space to add some SSDs (e.g.
2-4 80 GByte Intel SSDs), and the power supply
should be able to take both the 2x SATA HDs as well 
as 2-4 SATA SSDs, though I would need to splice into
existing power cables.

I also have a few LSI and an IBM M1015 (potentially 
reflashable to IT mode) adapters, so having enough ports
is less an issue (I'll probably use an LSI
with 4x SAS/SATA for 4x SSD, and keep the onboard SATA
for HDs, or use each 2x for SSD and HD).

Now there are multiple configurations for this. 
Some using Linux (roof fs on a RAID10, /home on
RAID 1) or zfs. Now zfs on Linux probably wouldn't
do hybrid zfs pools (would it?) and it wouldn't
be probably stable enough for production. Right?

Assuming I wont't have to compromise CPU performance 
(it's an anemic Opteron 1210 1.8 GHz, dual core, after all, and
it will probably run several 10 of zones in production) and
sacrifice data integrity, can I make e.g. LSI SAS3442E
directly do SSD caching (it says something about CacheCade,
but I'm not sure it's an OS-side driver thing), as it
is supposed to boost IOPS? Unlikely shot, but probably
somebody here would know.

If not, should I go directly OpenIndiana, and use
a hybrid pool?

Should I use all 4x SATA SSDs and 2x SATA HDs to
do a hybrid pool, or would this be an overkill?
The SSDs are Intel SSDSA2M080G2GC 80 GByte, so no speed demons
either. However, they've seen some wear and tear and
none of them has keeled over yet. So I think they'll
be good for a few more years.

How would you lay out the pool with OpenIndiana
in either case to maximize IOPS and minimize CPU
load (assuming it's an issue)? I wouldn't mind
to trade 1/3rd to 1/2 of CPU due to zfs load, if
I can get decent IOPS.

This is terribly specific, I know, but I figured
somebody had tried something like that with an X2100 M2,
it being a rather popular Sun (RIP) Solaris box at
the time. Or not.

Thanks muchly, in any case.

-- Eugen
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss