Re: [zfs-discuss] ZFS ZIL + L2ARC SSD Setup

2010-02-12 Thread Brendan Gregg - Sun Microsystems
G'Day,

On Sat, Feb 13, 2010 at 09:02:58AM +1100, Daniel Carosone wrote:
> On Fri, Feb 12, 2010 at 11:26:33AM -0800, Richard Elling wrote:
> > Mathing aorund a bit, for a 300 GB L2ARC (apologies for the tab separation):
> > size (GB)   300 
> > size (sectors)  585937500   
> > labels (sectors)9232
> > available sectors   585928268   
> > bytes/L2ARC header  200 
> > 
> > recordsize (sectors)recordsize (kBytes) L2ARC capacity 
> > (records)Header size (MBytes)
> > 1   0.5 585928268   111,760
> > 2   1   292964134   55,880
> > 4   2   146482067   27,940
> > 8   4   7324103313,970
> > 16  8   366205166,980
> > 32  16  183102583,490
> > 64  32  9155129 1,750
> > 128 64  4577564 870
> > 256 128 2288782 440
> > 
> > So, depending on the data, you need somewhere between 440 MBytes and  111 
> > GBytes
> > to hold the L2ARC headers. For a rule of thumb, somewhere between 0.15% and 
> > 40%
> > of the total used size. Ok, that rule really isn't very useful...
> 
> All that precision up-front for such a broad conclusion..  bummer :)
> 
> I'm interested in a better rule of thumb, for rough planning
> purposes.  As previously noted, I'm especially interesed in the

I use 2.5% for an 8 Kbyte record size.  ie, for every 1 Gbyte of L2ARC, about
25 Mbytes of ARC is consumed.  I don't recommand other record sizes since:

- the L2ARC is currently intended for random I/O workloads.  Such workloads
  usually have small record sizes, such as 8 Kbytes.  Larger record sizes (such
  as the 128 Kbyte default) is better for streaming workloads.  The L2ARC
  doesn't currently touch streaming workloads (l2arc_noprefetch=1).

- The best performance from SSDs is with smaller I/O sizes, not larger.  I get
  about 3200 x 8 Kbyte read I/O from my current L2ARC devices, yet only about
  750 x 128 Kbyte read I/O from the same devices.

- smaller than 4 Kbyte record sizes leads to a lot of ARC headers and worse
  streaming performance.  I wouldn't tune it smaller unless I had to for
  some reason.

So, from the table above I'd only really consider the 4 to 32 Kbyte size range.
4 Kbytes if you really wanted a smaller record size, and 32 Kbytes if you had
limited DRAM you wanted to conserve (at the trade-off of SSD performance.)

Brendan


> combination with dedup, where DDT entries need to be cached.  What's
> the recordsize for L2ARC-of-on-disk-DDT, and how does that bias the
> overhead %age above?
> 
> I'm also interested in a more precise answer to a different question,
> later on.  Lets say I already have an L2ARC, running and warm.  How do
> I tell how much is being used?  Presumably, if it's not full, RAM 
> to manage it is the constraint - how can I confirm that and how can I
> tell how much RAM is currently used?
> 
> If I can observe these figures, I can tell if I'm wasting ssd space
> that can't be used.  Either I can reallocate that space or know that
> adding RAM will have an even bigger benefit (increasing both primary
> and secondary cache sizes).  Maybe I can even decide that L2ARC is not
> worth it for this box (especially if it can't fit any more RAM).
> 
> Finally, how smart is L2ARC at optimising this usage? If it's under
> memory pressure, does it prefer to throw out smaller records in favour
> of larger more efficient ones? 
> 
> My current rule of thumb for all this, absent better information, is
> that you should just have gobs of RAM (no surprise there) but that if
> you can't, then dedup seems to be most worthwhile when the pool itself
> is on ssd, no l2arc. Say, a laptop.  Here, you care most about saving
> space and the IO overhead costs least.
> 
> We need some thumbs in between these extremes.  :-(
> 
> --
> Dan.


> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


-- 
Brendan Gregg, Fishworks   http://blogs.sun.com/brendan
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS ZIL + L2ARC SSD Setup

2010-02-12 Thread Daniel Carosone
On Fri, Feb 12, 2010 at 11:26:33AM -0800, Richard Elling wrote:
> Mathing aorund a bit, for a 300 GB L2ARC (apologies for the tab separation):
>   size (GB)   300 
>   size (sectors)  585937500   
>   labels (sectors)9232
>   available sectors   585928268   
>   bytes/L2ARC header  200 
>   
>   recordsize (sectors)recordsize (kBytes) L2ARC capacity 
> (records)Header size (MBytes)
>   1   0.5 585928268   111,760
>   2   1   292964134   55,880
>   4   2   146482067   27,940
>   8   4   7324103313,970
>   16  8   366205166,980
>   32  16  183102583,490
>   64  32  9155129 1,750
>   128 64  4577564 870
>   256 128 2288782 440
> 
> So, depending on the data, you need somewhere between 440 MBytes and  111 
> GBytes
> to hold the L2ARC headers. For a rule of thumb, somewhere between 0.15% and 
> 40%
> of the total used size. Ok, that rule really isn't very useful...

All that precision up-front for such a broad conclusion..  bummer :)

I'm interested in a better rule of thumb, for rough planning
purposes.  As previously noted, I'm especially interesed in the
combination with dedup, where DDT entries need to be cached.  What's
the recordsize for L2ARC-of-on-disk-DDT, and how does that bias the
overhead %age above?

I'm also interested in a more precise answer to a different question,
later on.  Lets say I already have an L2ARC, running and warm.  How do
I tell how much is being used?  Presumably, if it's not full, RAM 
to manage it is the constraint - how can I confirm that and how can I
tell how much RAM is currently used?

If I can observe these figures, I can tell if I'm wasting ssd space
that can't be used.  Either I can reallocate that space or know that
adding RAM will have an even bigger benefit (increasing both primary
and secondary cache sizes).  Maybe I can even decide that L2ARC is not
worth it for this box (especially if it can't fit any more RAM).

Finally, how smart is L2ARC at optimising this usage? If it's under
memory pressure, does it prefer to throw out smaller records in favour
of larger more efficient ones? 

My current rule of thumb for all this, absent better information, is
that you should just have gobs of RAM (no surprise there) but that if
you can't, then dedup seems to be most worthwhile when the pool itself
is on ssd, no l2arc. Say, a laptop.  Here, you care most about saving
space and the IO overhead costs least.

We need some thumbs in between these extremes.  :-(

--
Dan.

pgpgmOukpOsXT.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS ZIL + L2ARC SSD Setup

2010-02-12 Thread Richard Elling
On Feb 12, 2010, at 9:36 AM, Felix Buenemann wrote:

> Am 12.02.10 18:17, schrieb Richard Elling:
>> On Feb 12, 2010, at 8:20 AM, Felix Buenemann wrote:
>> 
>>> Hi Mickaël,
>>> 
>>> Am 12.02.10 13:49, schrieb Mickaël Maillot:
 Intel X-25 M are MLC not SLC, there are very good for L2ARC.
>>> 
>>> Yes, I'm only using those for L2ARC, I'm planing on getting to Mtron Pro 
>>> 7500 16GB SLC SSDs for ZIL.
>>> 
 and next, you need more RAM:
 ZFS can't handle 4x 80 Gb of L2ARC with only 4Gb of RAM because ZFS
 use memory to allocate and manage L2ARC.
>>> 
>>> Is there a guideline in which relation L2ARC size should be to RAM?
>> 
>> Approximately 200 bytes per record. I use the following example:
>>  Suppose we use a Seagate LP 2 TByte disk for the L2ARC
>>  + Disk has 3,907,029,168 512 byte sectors, guaranteed
>>  + Workload uses 8 kByte fixed record size
>>  RAM needed for arc_buf_hdr entries
>>  + Need = ~(3,907,029,168 - 9,232) * 200 / 16 = ~48 GBytes
>> 
>> Don't underestimate the RAM needed for large L2ARCs
> 
> I'm not sure how your workload record size plays into above formula (where 
> does - 9232 come from?), but given I've got ~300GB L2ARC, I'd need about 
> 7.2GB RAM, so upgrading to 8GB would be enough to satisfy the L2ARC.

recordsize=8kB=16 sectors @ 512 bytes/sector

9,232 is the number of sectors reserved for labels, around 4.75 MBytes

Mathing aorund a bit, for a 300 GB L2ARC (apologies for the tab separation):
size (GB)   300 
size (sectors)  585937500   
labels (sectors)9232
available sectors   585928268   
bytes/L2ARC header  200 

recordsize (sectors)recordsize (kBytes) L2ARC capacity 
(records)Header size (MBytes)
1   0.5 585928268   111,760
2   1   292964134   55,880
4   2   146482067   27,940
8   4   7324103313,970
16  8   366205166,980
32  16  183102583,490
64  32  9155129 1,750
128 64  4577564 870
256 128 2288782 440

So, depending on the data, you need somewhere between 440 MBytes and  111 GBytes
to hold the L2ARC headers. For a rule of thumb, somewhere between 0.15% and 40%
of the total used size. Ok, that rule really isn't very useful...

The next question is, what does my data look like?  The answer is that there 
will
most likely be a distribution of various sized record. But the distribution 
isn't as 
interesting for this calculation  than the actual number of records. I'm not 
sure
there is an easy way to get that information, but I'll look around...
 -- richard


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS ZIL + L2ARC SSD Setup

2010-02-12 Thread Bill Sommerfeld

On 02/12/10 09:36, Felix Buenemann wrote:

given I've got ~300GB L2ARC, I'd
need about 7.2GB RAM, so upgrading to 8GB would be enough to satisfy the
L2ARC.


But that would only leave ~800MB free for everything else the server 
needs to do.


- Bill
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS ZIL + L2ARC SSD Setup

2010-02-12 Thread Felix Buenemann

Am 12.02.10 18:17, schrieb Richard Elling:

On Feb 12, 2010, at 8:20 AM, Felix Buenemann wrote:


Hi Mickaël,

Am 12.02.10 13:49, schrieb Mickaël Maillot:

Intel X-25 M are MLC not SLC, there are very good for L2ARC.


Yes, I'm only using those for L2ARC, I'm planing on getting to Mtron Pro 7500 
16GB SLC SSDs for ZIL.


and next, you need more RAM:
ZFS can't handle 4x 80 Gb of L2ARC with only 4Gb of RAM because ZFS
use memory to allocate and manage L2ARC.


Is there a guideline in which relation L2ARC size should be to RAM?


Approximately 200 bytes per record. I use the following example:
Suppose we use a Seagate LP 2 TByte disk for the L2ARC
+ Disk has 3,907,029,168 512 byte sectors, guaranteed
+ Workload uses 8 kByte fixed record size
RAM needed for arc_buf_hdr entries
+ Need = ~(3,907,029,168 - 9,232) * 200 / 16 = ~48 GBytes

Don't underestimate the RAM needed for large L2ARCs


I'm not sure how your workload record size plays into above formula 
(where does - 9232 come from?), but given I've got ~300GB L2ARC, I'd 
need about 7.2GB RAM, so upgrading to 8GB would be enough to satisfy the 
L2ARC.



  -- richard



I could upgrade the server to 8GB, but that's the maximum the i975X chipset can 
handle.

Best Regards,
Felix Buenemann



- Felix



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS ZIL + L2ARC SSD Setup

2010-02-12 Thread Richard Elling
On Feb 12, 2010, at 8:20 AM, Felix Buenemann wrote:

> Hi Mickaël,
> 
> Am 12.02.10 13:49, schrieb Mickaël Maillot:
>> Intel X-25 M are MLC not SLC, there are very good for L2ARC.
> 
> Yes, I'm only using those for L2ARC, I'm planing on getting to Mtron Pro 7500 
> 16GB SLC SSDs for ZIL.
> 
>> and next, you need more RAM:
>> ZFS can't handle 4x 80 Gb of L2ARC with only 4Gb of RAM because ZFS
>> use memory to allocate and manage L2ARC.
> 
> Is there a guideline in which relation L2ARC size should be to RAM?

Approximately 200 bytes per record. I use the following example:
Suppose we use a Seagate LP 2 TByte disk for the L2ARC
+ Disk has 3,907,029,168 512 byte sectors, guaranteed
+ Workload uses 8 kByte fixed record size
RAM needed for arc_buf_hdr entries
+ Need = ~(3,907,029,168 - 9,232) * 200 / 16 = ~48 GBytes

Don't underestimate the RAM needed for large L2ARCs
 -- richard

> 
> I could upgrade the server to 8GB, but that's the maximum the i975X chipset 
> can handle.
> 
> Best Regards,
>Felix Buenemann
> 
> 
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS ZIL + L2ARC SSD Setup

2010-02-12 Thread Felix Buenemann

Hi Mickaël,

Am 12.02.10 13:49, schrieb Mickaël Maillot:

Intel X-25 M are MLC not SLC, there are very good for L2ARC.


Yes, I'm only using those for L2ARC, I'm planing on getting to Mtron Pro 
7500 16GB SLC SSDs for ZIL.



and next, you need more RAM:
ZFS can't handle 4x 80 Gb of L2ARC with only 4Gb of RAM because ZFS
use memory to allocate and manage L2ARC.


Is there a guideline in which relation L2ARC size should be to RAM?

I could upgrade the server to 8GB, but that's the maximum the i975X 
chipset can handle.


Best Regards,
Felix Buenemann


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS ZIL + L2ARC SSD Setup

2010-02-12 Thread Mickaël Maillot
Hi

Intel X-25 M are MLC not SLC, there are very good for L2ARC.

and next, you need more RAM:
ZFS can't handle 4x 80 Gb of L2ARC with only 4Gb of RAM because ZFS
use memory to allocate and manage L2ARC.

2010/2/10 Felix Buenemann :
> Am 09.02.10 09:58, schrieb Felix Buenemann:
>>
>> Am 09.02.10 02:30, schrieb Bob Friesenhahn:
>>>
>>> On Tue, 9 Feb 2010, Felix Buenemann wrote:

 Well to make things short: Using JBOD + ZFS Striped Mirrors vs.
 controller's RAID10, dropped the max. sequential read I/O from over
 400 MByte/s to below 300 MByte/s. However random I/O and sequential
 writes seemed to perform
>>>
>>> Much of the difference is likely that your controller implements true
>>> RAID10 wereas ZFS "striped" mirrors are actually load-shared mirrors.
>>> Since zfs does not use true striping across vdevs, it relies on
>>> sequential prefetch requests to get the sequential read rate up.
>>> Sometimes zfs's prefetch is not aggressive enough.
>>>
>>> I have observed that there may still be considerably more read
>>> performance available (to another program/thread) even while a benchmark
>>> program is reading sequentially as fast as it can.
>>>
>>> Try running two copies of your benchmark program at once and see what
>>> happens.
>>
>> Yes, JBOD + ZFS load-balanced mirrors does seem to work better under
>> heavy load. I tried rebooting a Windows VM from NFS, which took about 43
>> sec with hot cache in both cases. But when doing this during a bonnie++
>> benchmark run, the ZFS mirrors would win big time, taking just 2:47sec
>> instead of over 4min to reboot the VM.
>> So I think in a real world scenario, the ZFS mirrors will win.
>>
>> On a sitenote however I noticed that small sequential I/O (copying a
>> 150MB sourcetree to NFS), the ZFS mirrors where 50% slower than the
>> controllers RAID10.
>
> I had a hunch that the controllers volume read ahead would interfere with
> the ZFS load-shared mirrors and voilà: sequential reads jumped from 270
> MByte/s to 420 MByte/s, which checks out nicely, because writes are about
> 200 MByte/s.
>
>>
>>> Bob
>>
>> - Felix
>
> - Felix
>
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS ZIL + L2ARC SSD Setup

2010-02-09 Thread Felix Buenemann

Am 09.02.10 09:58, schrieb Felix Buenemann:

Am 09.02.10 02:30, schrieb Bob Friesenhahn:

On Tue, 9 Feb 2010, Felix Buenemann wrote:


Well to make things short: Using JBOD + ZFS Striped Mirrors vs.
controller's RAID10, dropped the max. sequential read I/O from over
400 MByte/s to below 300 MByte/s. However random I/O and sequential
writes seemed to perform


Much of the difference is likely that your controller implements true
RAID10 wereas ZFS "striped" mirrors are actually load-shared mirrors.
Since zfs does not use true striping across vdevs, it relies on
sequential prefetch requests to get the sequential read rate up.
Sometimes zfs's prefetch is not aggressive enough.

I have observed that there may still be considerably more read
performance available (to another program/thread) even while a benchmark
program is reading sequentially as fast as it can.

Try running two copies of your benchmark program at once and see what
happens.


Yes, JBOD + ZFS load-balanced mirrors does seem to work better under
heavy load. I tried rebooting a Windows VM from NFS, which took about 43
sec with hot cache in both cases. But when doing this during a bonnie++
benchmark run, the ZFS mirrors would win big time, taking just 2:47sec
instead of over 4min to reboot the VM.
So I think in a real world scenario, the ZFS mirrors will win.

On a sitenote however I noticed that small sequential I/O (copying a
150MB sourcetree to NFS), the ZFS mirrors where 50% slower than the
controllers RAID10.


I had a hunch that the controllers volume read ahead would interfere 
with the ZFS load-shared mirrors and voilà: sequential reads jumped from 
270 MByte/s to 420 MByte/s, which checks out nicely, because writes are 
about 200 MByte/s.





Bob


- Felix


- Felix

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS ZIL + L2ARC SSD Setup

2010-02-09 Thread Felix Buenemann

Am 09.02.10 02:30, schrieb Bob Friesenhahn:

On Tue, 9 Feb 2010, Felix Buenemann wrote:


Well to make things short: Using JBOD + ZFS Striped Mirrors vs.
controller's RAID10, dropped the max. sequential read I/O from over
400 MByte/s to below 300 MByte/s. However random I/O and sequential
writes seemed to perform


Much of the difference is likely that your controller implements true
RAID10 wereas ZFS "striped" mirrors are actually load-shared mirrors.
Since zfs does not use true striping across vdevs, it relies on
sequential prefetch requests to get the sequential read rate up.
Sometimes zfs's prefetch is not aggressive enough.

I have observed that there may still be considerably more read
performance available (to another program/thread) even while a benchmark
program is reading sequentially as fast as it can.

Try running two copies of your benchmark program at once and see what
happens.


Yes, JBOD + ZFS load-balanced mirrors does seem to work better under 
heavy load. I tried rebooting a Windows VM from NFS, which took about 43 
sec with hot cache in both cases. But when doing this during a bonnie++ 
benchmark run, the ZFS mirrors would win big time, taking just 2:47sec 
instead of over 4min to reboot the VM.

So I think in a real world scenario, the ZFS mirrors will win.

On a sitenote however I noticed that small sequential I/O (copying a 
150MB sourcetree to NFS), the ZFS mirrors where 50% slower than the 
controllers RAID10.



Bob


- Felix


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS ZIL + L2ARC SSD Setup

2010-02-08 Thread Bob Friesenhahn

On Tue, 9 Feb 2010, Felix Buenemann wrote:


Well to make things short: Using JBOD + ZFS Striped Mirrors vs. controller's 
RAID10, dropped the max. sequential read I/O from over 400 MByte/s to below 
300 MByte/s. However random I/O and sequential writes seemed to perform


Much of the difference is likely that your controller implements true 
RAID10 wereas ZFS "striped" mirrors are actually load-shared mirrors. 
Since zfs does not use true striping across vdevs, it relies on 
sequential prefetch requests to get the sequential read rate up. 
Sometimes zfs's prefetch is not aggressive enough.


I have observed that there may still be considerably more read 
performance available (to another program/thread) even while a 
benchmark program is reading sequentially as fast as it can.


Try running two copies of your benchmark program at once and see what 
happens.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS ZIL + L2ARC SSD Setup

2010-02-08 Thread Felix Buenemann

Am 08.02.10 22:23, schrieb Bob Friesenhahn:

On Mon, 8 Feb 2010, Richard Elling wrote:



If there is insufficient controller bandwidth capacity, then the
controller becomes the bottleneck.


We don't tend to see this for HDDs, but SSDs can crush a controller and
channel.


It is definitely seen with older PCI hardware.


Well to make things short: Using JBOD + ZFS Striped Mirrors vs. 
controller's RAID10, dropped the max. sequential read I/O from over 400 
MByte/s to below 300 MByte/s. However random I/O and sequential writes 
seemed to perform equally well.
One thing however was mucbh better using ZFS mirrors: random seek 
performance was about 4 times higher, so I guess for random I/O on a 
busy system the JBOD would win.


The controller can deliver 800 MByte/s on cache hits and is connected 
with PCIe x8, so theoretically it should have enough PCI bandwidth. It's 
cpu is the older 500MHz IOP333, so it has less power than the newer 
IOP348 controllers with 1.2GHZ cpus.


Too bad I have no choice but using HW RAID, because the mainboard bios 
only supports 7 boot devices, so it can't boot from the right disk if 
the Areca is in JBOD and I found no way to disable the controllers BIOS.

Well maybe I could flash the EFI BIOS to work around this...
(I've done my tests by reconfiguring the controller at runtime.)



Bob


- Felix


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS ZIL + L2ARC SSD Setup

2010-02-08 Thread Bob Friesenhahn

On Mon, 8 Feb 2010, Richard Elling wrote:


If there is insufficient controller bandwidth capacity, then the 
controller becomes the bottleneck.


We don't tend to see this for HDDs, but SSDs can crush a controller and
channel.


It is definitely seen with older PCI hardware.

Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS ZIL + L2ARC SSD Setup

2010-02-08 Thread Richard Elling
To add to Bob's notes...

On Feb 8, 2010, at 8:37 AM, Bob Friesenhahn wrote:
> On Mon, 8 Feb 2010, Felix Buenemann wrote:
>> 
>> I was under the impression, that using HW RAID10 would save me 50% PCI 
>> bandwidth and allow the controller to more intelligently handle its cache, 
>> so I sticked with it. But I should run some benchmarks in RAID10 vs. JBOD 
>> with ZFS mirrors to see if this makes a difference.
> 
> The answer to this is "it depends".  If the PCI-E and controller have enough 
> bandwidth capacity, then the write bottleneck will be the disk itself.  

If you have HDDs, the write bandwidth bottleneck will be the disk.

> If there is insufficient controller bandwidth capacity, then the controller 
> becomes the bottleneck.

We don't tend to see this for HDDs, but SSDs can crush a controller and
channel.
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS ZIL + L2ARC SSD Setup

2010-02-08 Thread Bob Friesenhahn

On Mon, 8 Feb 2010, Felix Buenemann wrote:


I was under the impression, that using HW RAID10 would save me 50% PCI 
bandwidth and allow the controller to more intelligently handle its cache, so 
I sticked with it. But I should run some benchmarks in RAID10 vs. JBOD with 
ZFS mirrors to see if this makes a difference.


The answer to this is "it depends".  If the PCI-E and controller have 
enough bandwidth capacity, then the write bottleneck will be the 
disk itself.  If there is insufficient controller bandwidth capacity, 
then the controller becomes the bottleneck.   If the bottleneck is the 
disks, then there is hardly any write penalty from using zfs mirrors. 
If the bottleneck is the controller, then you may see 1/2 the 
write performance due to using zfs mirrors.


If you are using modern computing hardware, then the disks should be 
the bottleneck.


Performance of HW RAID controllers is a complete unknown and they tend 
to modify the data so that it depends on the specific controller, 
which really sucks if the controller fails.  It is usually better to 
run the controller in a JBOD mode (taking advantage of its write 
cache, if available) and use zfs mirrors.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS ZIL + L2ARC SSD Setup

2010-02-08 Thread Felix Buenemann

Hi Daniel,

Am 08.02.10 05:45, schrieb Daniel Carosone:

On Mon, Feb 08, 2010 at 04:58:38AM +0100, Felix Buenemann wrote:

I have some questions about the choice of SSDs to use for ZIL and L2ARC.


I have one answer.  The other questions are mostly related to your
raid controller, which I can't answer directly.


- Is it safe to run the L2ARC without battery backup with write cache
enabled?


Yes, it's just a cache, errors will be detected and re-fetched from
the pool. Also, it is volatile-at-reboot (starts cold) at present
anyway, so preventing data loss at power off is not worth spending any
money or time over.


Thanks for clarifying this.


- Does it make sense to use HW RAID10 on the storage controller or would
I get better performance out of JBOD + ZFS RAIDZ2?


A more comparable alternative would be using the controller in jbod
mode and a pool of zfs mirror vdevs.  I'd expect that gives similar
performance to the controller's mirroring (unless higher pci bus usage
is a bottleneck) but gives you the benefits of zfs healing on disk
errors.


I was under the impression, that using HW RAID10 would save me 50% PCI 
bandwidth and allow the controller to more intelligently handle its 
cache, so I sticked with it. But I should run some benchmarks in RAID10 
vs. JBOD with ZFS mirrors to see if this makes a difference.



Performance of RaidZ/5 vs mirrors is a much more workload-sensitive
question, regardless of the additional implementation-specific
wrinkles of either kind.

Your emphasis on lots of slog and l2arc suggests performance is a
priority.  Whether all this kit is enough to hide the IOPS penalty of
raidz/5, or whether you need it even to make mirrors perform
adequately, you'll have to decide yourself.


So it seems right to assume, that RAIDZ1/2 has about the same 
performance hit as HW RAID5/6 with Write Cache. I wasn't aware that ZFS 
can do RAID10 style multiple mirrors, so that seems to be the better 
option anyways.



--
Dan.


- Felix

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS ZIL + L2ARC SSD Setup

2010-02-07 Thread Daniel Carosone
On Mon, Feb 08, 2010 at 04:58:38AM +0100, Felix Buenemann wrote:
> I have some questions about the choice of SSDs to use for ZIL and L2ARC.

I have one answer.  The other questions are mostly related to your
raid controller, which I can't answer directly.

> - Is it safe to run the L2ARC without battery backup with write cache  
> enabled?

Yes, it's just a cache, errors will be detected and re-fetched from
the pool. Also, it is volatile-at-reboot (starts cold) at present
anyway, so preventing data loss at power off is not worth spending any
money or time over.

> - Does it make sense to use HW RAID10 on the storage controller or would  
> I get better performance out of JBOD + ZFS RAIDZ2?

A more comparable alternative would be using the controller in jbod
mode and a pool of zfs mirror vdevs.  I'd expect that gives similar
performance to the controller's mirroring (unless higher pci bus usage
is a bottleneck) but gives you the benefits of zfs healing on disk
errors. 

Performance of RaidZ/5 vs mirrors is a much more workload-sensitive
question, regardless of the additional implementation-specific
wrinkles of either kind.

Your emphasis on lots of slog and l2arc suggests performance is a
priority.  Whether all this kit is enough to hide the IOPS penalty of
raidz/5, or whether you need it even to make mirrors perform
adequately, you'll have to decide yourself. 

--
Dan.

pgpnClNVygMD8.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss