Re: [zfs-discuss] ZFS Appliance as a general-purpose server question

2012-11-27 Thread Edmund White
On 11/27/12 1:52 AM, Grégory Giannoni s...@narguile.org wrote:



Le 27 nov. 2012 à 01:17, Erik Trimble a écrit :

 On 11/26/2012 12:54 PM, Grégory Giannoni wrote:
 [snip]
 I switched few month ago from Sun X45x0 to HP things : My fast NAS are
now DL 180 G6. I got better perfs using LSI 9240-8I rather than HP
SmartArray (tried P410  P812). I'm using only 600Gb SSD drives.
 That LSI controllers supports SATA III, or 6Gbps SATA.   The Px1x
controllers do 6GB SAS, but only 3GB SATA, so that's your likely perf
difference.  The SmartArray Px2x series should do both SATA and SAS at
6Gbps.

The SSD drives I'm using (Intel 320 600GB) are limited to 270MB/sec ; So
I don't think that SATA II is limiting.

 
 That said, I do think you're right that the LSI controller is probably
a better fit for connections requiring a SATA SSD.  The only exception
is having to give up the 1GB of NVRAM on the HP controller. :-(

I don't think that this is a real issue when using a bunch of SSDs. I
even wonder if the NVRAM is not slowing down writings. My tests were done
with ZIL enabled, so a power loss shouldn't damage the datas.

HP recommends to disable the write accelerator on SSD-only volumes.
http://h2.www2.hp.com/bizsupport/TechSupport/Document.jsp?lang=encc=us
taskId=120prodSeriesId=3802118prodTypeId=329290objectID=c02963968



 [...]
 Is the bottleneck the LSI controller, or the SAS/SATA bus, or the PCI-E
bus itself?  That is, have you tested with LSI 9240-4i  (one per 8-drive
cage, which I *believe* can use the HP multi-lane cable), and with a LSI
9260-16i or LSI 9280-24i?   My instinct would be to say it's the PCI-E
bus, and you could probably get away with the 4-channel cards.  i.e.
4-channels @ 6Gbit/s = 3 GBytes/s  4x PCI-E 2.0 at 2GB/s


The first bottleneck we reached (DL 180 / standard 25 drives bay) was the
HP controller (both P410 AND P812 reached the same perfs : 800MB/sec
writing, 1.3GB/sec reading).

With LSI 9240-8I, we reached 1.2GB/s writing, 1.3Gb/s reading.

The LSI 9240-4I was not able to connect to the 25-drives bay ; Not tested
 LSI 9260-16I or LSI 9280-24i.

The results were the same with 10 or 25 drives, so I suspected either the
PCI bus, either the expander in the 25-drives bay (HP 530946-001).
Plugging the disks directly to the LSI card allowed to gain few MB/s :
the expander was limiting a bit, but moreover, it disallowed to use more
than 1 disk controller !

By replacing the 25-drives bay by three 8-drives bays (507803-B21), the
system was able to use 3 LSI 9240-8I, with this 4.4GB/sec reading rate.


That's correct that you've run into the limitation of the expander on the
25-disk drive backplane. However, I'm curious about the 8-drive cage you
mention. I use that cage in the ML/DL370 G6 servers. I didn't think it
would fit into a DL180 G6. How is this arranged in your unit? What does
the resulting setup look like? Sine the DL180 drive cages are part of the
bezel, do you just have three loose cages connected to the controllers?

Also, with three controllers, didn't you max the number of available PCIe
slots? 

Anyway, the new HP SL4540 server is the next product worth testing in this
realmŠ 60 x LFF disks.
http://h18004.www1.hp.com/products/quickspecs/14406_na/14406_na.html



-- 
Edmund White
ewwh...@mac.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Appliance as a general-purpose server question

2012-11-27 Thread Grégory Giannoni

 [...]
 The results were the same with 10 or 25 drives, so I suspected either the
 PCI bus, either the expander in the 25-drives bay (HP 530946-001).
 Plugging the disks directly to the LSI card allowed to gain few MB/s :
 the expander was limiting a bit, but moreover, it disallowed to use more
 than 1 disk controller !
 [...]
 
 That's correct that you've run into the limitation of the expander on the
 25-disk drive backplane. However, I'm curious about the 8-drive cage you
 mention. I use that cage in the ML/DL370 G6 servers. I didn't think it
 would fit into a DL180 G6. How is this arranged in your unit? What does
 the resulting setup look like? Sine the DL180 drive cages are part of the
 bezel, do you just have three loose cages connected to the controllers?

It was not as easy that just unplug the 25-drives bay and plus 3 8 
drives-bays.. Few rivets to drill, backplane alimentation cable to trick (the 
pins and wires colors are not the same !), minimolex - molex cable the drives 
alimentation, and some screw to fix the cages. The result is really clean. Here 
are few pictures :

http://www.flickr.com/photos/webzinemaker/6964036523/in/photostream/




 Also, with three controllers, didn't you max the number of available PCIe
 slots? 

4 slots are available on the DL180 : 3 were used for the LSI controllers, and 
one for a nic.

 
 Anyway, the new HP SL4540 server is the next product worth testing in this
 realmŠ 60 x LFF disks.
 http://h18004.www1.hp.com/products/quickspecs/14406_na/14406_na.html

I might be a very good alternative for the X4540... But I wonder how many 
controllers are connected, and what are their perfs.

-- 
Grégory Giannoni
http://www.wmaker.net
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs on SunFire X2100M2 with hybrid pools

2012-11-27 Thread Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)
 From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
 boun...@opensolaris.org] On Behalf Of Eugen Leitl
 
 can I make e.g. LSI SAS3442E
 directly do SSD caching (it says something about CacheCade,
 but I'm not sure it's an OS-side driver thing), as it
 is supposed to boost IOPS? Unlikely shot, but probably
 somebody here would know.

Depending on the type of work you will be doing, the best performance thing you 
could do is to disable zil (zfs set sync=disabled) and use SSD's for cache.  
But don't go *crazy* adding SSD's for cache, because they still have some 
in-memory footprint.  If you have 8G of ram and 80G SSD's, maybe just use one 
of them for cache, and let the other 3 do absolutely nothing.  Better yet, make 
your OS on a pair of SSD mirror, then use pair of HDD mirror for storagepool, 
and one SSD for cache.  Then you have one SSD unused, which you could 
optionally add as dedicated log device to your storagepool.  There are specific 
situations where it's ok or not ok to disable zil - look around and ask here if 
you have any confusion about it.  

Don't do redundancy in hardware.  Let ZFS handle it.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs on SunFire X2100M2 with hybrid pools

2012-11-27 Thread Eugen Leitl
On Tue, Nov 27, 2012 at 12:12:43PM +, Edward Ned Harvey 
(opensolarisisdeadlongliveopensolaris) wrote:
  From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
  boun...@opensolaris.org] On Behalf Of Eugen Leitl
  
  can I make e.g. LSI SAS3442E
  directly do SSD caching (it says something about CacheCade,
  but I'm not sure it's an OS-side driver thing), as it
  is supposed to boost IOPS? Unlikely shot, but probably
  somebody here would know.
 
 Depending on the type of work you will be doing, the best performance thing 
 you could do is to disable zil (zfs set sync=disabled) and use SSD's for 
 cache.  But don't go *crazy* adding SSD's for cache, because they still have 
 some in-memory footprint.  If you have 8G of ram and 80G SSD's, maybe just 
 use one of them for cache, and let the other 3 do absolutely nothing.  Better 
 yet, make your OS on a pair of SSD mirror, then use pair of HDD mirror for 
 storagepool, and one SSD for cache.  Then you have one SSD unused, which you 
 could optionally add as dedicated log device to your storagepool.  There are 
 specific situations where it's ok or not ok to disable zil - look around and 
 ask here if you have any confusion about it.  
 
 Don't do redundancy in hardware.  Let ZFS handle it.

Thanks. I'll try doing that, and see how it works out.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs on SunFire X2100M2 with hybrid pools

2012-11-27 Thread Jim Klimov

Performance-wise, I think you should go for mirrors/raid10, and
separate the pools (i.e. rpool mirror on SSD and data mirror on
HDDs). If you have 4 SSDs, you might mirror the other couple for
zoneroots or some databases in datasets delegated into zones,
for example. Don't use dedup. Carve out some space for L2ARC.
As Ed noted, you might not want to dedicate much disk space due
to remaining RAM pressure when using the cache; however, spreading
the IO load between smaller cache partitions/slices on each SSD
may help your IOPS on average. Maybe go for compression.

I really hope someone better versed in compression - like Saso -
would chime in to say whether gzip-9 vs. lzjb (or lz4) sucks in
terms of read-speeds from the pools. My HDD-based assumption is
in general that the less data you read (or write) on platters -
the better, and the spare CPU cycles can usually take the hit.

I'd spread out the different data types (i.e. WORM programs,
WORM-append logs and random-io other application data) into
various datasets with different settings, backed by different
storage - since you have the luxury.

Many best practice documents (and original Sol10/SXCE/LiveUpgrade
requirements) place the zoneroots on the same rpool so they can
be upgraded seamlessly as part of the OS image. However you can
also delegate ZFS datasets into zones and/or have lofs mounts
from GZ to LZ (maybe needed for shared datasets like distros
and homes - and faster/more robust than NFS from GZ to LZ).
For OS images (zoneroots) I'd use gzip-9 or better (likely lz4
when it gets integrated), same for logfile datasets, and lzjb,
zle or none for the random-io datasets. For structured things
like databases I also research the block IO size and use that
(at dataset creation time) to reduce extra work with ZFS COW
during writes - at expense of more metadata.

You'll likely benefit from having OS images on SSDs, logs on
HDDs (including logs from the GZ and LZ OSes, to reduce needless
writes on the SSDs), and databases on SSDs. Things depend for
other data types, and in general would be helped by L2ARC on
the SSDs.

Also note that much of the default OS image is not really used
(i.e. X11 on headless boxes), so you might want to do weird
things with GZ or LZ rootfs data layouts - note that these might
puzzle your beadm/liveupgrade software, so you'll have to do
any upgrades with lots of manual labor :)

On a somewhat orthogonal route, I'd start with setting up a
generic dummy zone, perhaps with much unneeded software,
and zfs-cloning that to spawn application zones. This way
you only pay the footprint price once, at least until you
have to upgrade the LZ OSes - in that case it might be cheaper
(in terms of storage at least) to upgrade the dummy, clone it
again, and port the LZ's customizations (installed software)
by finding the differences between the old dummy and current
zone state (zfs diff, rsync -cn, etc.) In such upgrades you're
really well served by storing volatile data in separate datasets
from the zone OS root - you just reattach these datasets to the
upgraded OS image and go on serving.

As a particular example of the thing often upgraded and taking
considerable disk space per copy - I'd have the current JDK
installed in GZ: either simply lofs-mounted from GZ to LZs,
or in a separate dataset, cloned and delegated into LZs (if
JDK customizations are further needed by some - but not all -
local zones, i.e. timezone updates, trusted CA certs, etc.).

HTH,
//Jim Klimov
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs on SunFire X2100M2 with hybrid pools

2012-11-27 Thread Jim Klimov

Now that I thought of it some more, a follow-up is due on my advices:

1) While the best practices do(did) dictate to set up zoneroots in
   rpool, this is certainly not required - and I maintain lots of
   systems which store zones in separate data pools. This minimizes
   write-impact on rpools and gives the fuzzy feeling of keeping
   the systems safer from unmountable or overfilled roots.

2) Whether LZs and GZs are in the same rpool for you, or you stack
   tens of your LZ roots in a separate pool, they do in fact offer
   a nice target for dedup - with expected large dedup ratio which
   would outweigh both the overheads and IO lags (especially if it
   is on SSD pool) and the inconveniences of my approach with cloned
   dummy zones - especially upgrades thereof. Just remember to use
   the same compression settings (or lack of compression) on all
   zoneroots, so that the zfs blocks for OS image files would be
the same and dedupable.

HTH,
//Jim Klimov
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs on SunFire X2100M2 with hybrid pools

2012-11-27 Thread Fajar A. Nugraha
On Tue, Nov 27, 2012 at 5:13 AM, Eugen Leitl eu...@leitl.org wrote:
 Now there are multiple configurations for this.
 Some using Linux (roof fs on a RAID10, /home on
 RAID 1) or zfs. Now zfs on Linux probably wouldn't
 do hybrid zfs pools (would it?)

Sure it does. You can even use the whole disk as zfs, with no
additional partition required (not even for /boot).

 and it wouldn't
 be probably stable enough for production. Right?

Depends on how you define stable, and what kind of in-house
expertise you have.

Some companies are selling (or plan to sell, as their product is in
open beta stage) storage appliances powered by zfs on linux (search
the ZoL list for details). So it's definitely stable-enough for them.

-- 
Fajar
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Intel DC S3700

2012-11-27 Thread Mauricio Tavares
  Going a bit on a tangent, does anyone know if those drives are
available for sale anywhere?
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Question about degraded drive

2012-11-27 Thread Chris Dunbar - Earthside, LLC
Hello,

 

I have a degraded mirror set and this is has happened a few times (not
always the same drive) over the last two years. In the past I replaced the
drive and and ran zpool replace and all was well. I am wondering, however,
if it is safe to run zpool replace without replacing the drive to see if
it is in fact failed. On traditional RAID systems I have had drives drop
out of an array, but be perfectly fine. Adding them back to the array
returned the drive to service and all was well. Does that approach work
with ZFS? If not, is there another way to test the drive before making the
decision to yank and replace?

 

Thank you!
Chris

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Question about degraded drive

2012-11-27 Thread Freddie Cash
You don't use replace on mirror vdevs.

'zpool detach' the failed drive. Then 'zpool attach' the new drive.
On Nov 27, 2012 6:00 PM, Chris Dunbar - Earthside, LLC 
cdun...@earthside.net wrote:

 Hello,

 ** **

 I have a degraded mirror set and this is has happened a few times (not
 always the same drive) over the last two years. In the past I replaced the
 drive and and ran zpool replace and all was well. I am wondering, however,
 if it is safe to run zpool replace without replacing the drive to see if it
 is in fact failed. On traditional RAID systems I have had drives drop out
 of an array, but be perfectly fine. Adding them back to the array returned
 the drive to service and all was well. Does that approach work with ZFS? If
 not, is there another way to test the drive before making the decision to
 yank and replace?

 ** **

 Thank you!
 Chris

 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Question about degraded drive

2012-11-27 Thread Freddie Cash
And you can try 'zpool online' on the failed drive to see if it comes back
online.
On Nov 27, 2012 6:08 PM, Freddie Cash fjwc...@gmail.com wrote:

 You don't use replace on mirror vdevs.

 'zpool detach' the failed drive. Then 'zpool attach' the new drive.
 On Nov 27, 2012 6:00 PM, Chris Dunbar - Earthside, LLC 
 cdun...@earthside.net wrote:

 Hello,

 ** **

 I have a degraded mirror set and this is has happened a few times (not
 always the same drive) over the last two years. In the past I replaced the
 drive and and ran zpool replace and all was well. I am wondering, however,
 if it is safe to run zpool replace without replacing the drive to see if it
 is in fact failed. On traditional RAID systems I have had drives drop out
 of an array, but be perfectly fine. Adding them back to the array returned
 the drive to service and all was well. Does that approach work with ZFS? If
 not, is there another way to test the drive before making the decision to
 yank and replace?

 ** **

 Thank you!
 Chris

 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Question about degraded drive

2012-11-27 Thread Jan Owoc
Hi Chris,

On Tue, Nov 27, 2012 at 6:56 PM, Chris Dunbar - Earthside, LLC 
cdun...@earthside.net wrote:

 Hello,

 ** **

 I have a degraded mirror set and this is has happened a few times (not
 always the same drive) over the last two years. In the past I replaced the
 drive and and ran zpool replace and all was well. I am wondering, however,
 if it is safe to run zpool replace without replacing the drive to see if it
 is in fact failed. On traditional RAID systems I have had drives drop out
 of an array, but be perfectly fine. Adding them back to the array returned
 the drive to service and all was well. Does that approach work with ZFS? If
 not, is there another way to test the drive before making the decision to
 yank and replace?

 ** **


I have two tidbits of useful information.

1) zpool scrub mypoolname will attempt to read all data on all disks in
the pool and verify against the checksum. If you suspect the disk is fine,
you can clear the errors, run a scrub, and check the zpool status to see
if there are read/checksum errors on the disk. If there are, I'd replace
the drive.

2) if you have an additional hard drive bay/cable/controller, you can do a
zpool replace on the offending drive without doing a detach first -
this may save you from the other drive failing during resilvering.

Jan
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Question about degraded drive

2012-11-27 Thread Chris Dunbar
Sorry, I was skipping bits to get to the main point. I did use replace (as 
previously instructed on the list). I think that worked because my spare had 
taken over for the failed drive. That's the same situation now - spare in 
service for the failed drive. 

Sent from my iPhone

On Nov 27, 2012, at 9:08 PM, Freddie Cash fjwc...@gmail.com wrote:

 You don't use replace on mirror vdevs.
 
 'zpool detach' the failed drive. Then 'zpool attach' the new drive.
 
 On Nov 27, 2012 6:00 PM, Chris Dunbar - Earthside, LLC 
 cdun...@earthside.net wrote:
 Hello,
 
  
 
 I have a degraded mirror set and this is has happened a few times (not 
 always the same drive) over the last two years. In the past I replaced the 
 drive and and ran zpool replace and all was well. I am wondering, however, 
 if it is safe to run zpool replace without replacing the drive to see if it 
 is in fact failed. On traditional RAID systems I have had drives drop out of 
 an array, but be perfectly fine. Adding them back to the array returned the 
 drive to service and all was well. Does that approach work with ZFS? If not, 
 is there another way to test the drive before making the decision to yank 
 and replace?
 
  
 
 Thank you!
 Chris
 
 
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss