Re: [zfs-discuss] ZFS and TRIM

2011-02-04 Thread Orvar Korvar
So, the bottom line is that Solaris 11 Express can not use TRIM and SSD? Is 
that the conclusion? So, it might not be a good idea to use a SSD?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS and L2ARC memory requirements?

2011-02-04 Thread Orvar Korvar
100TB storage? Cool! What is the hardware? How many discs? Gief me ze hardware! 
:oP
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS and TRIM

2011-02-04 Thread Deano
Even without TRIM and lots of use, SSD are still likely to help as ZIL and
L2ARC cache units better than spindle disks, however don't expect the same
performance you got after a fresh wipe/install.

It makes sense to go with the brands with the best garbage collector you can
and also if you can leave more space unused, that helps.

Bye,
Deano


-Original Message-
From: zfs-discuss-boun...@opensolaris.org
[mailto:zfs-discuss-boun...@opensolaris.org] On Behalf Of Orvar Korvar
Sent: 04 February 2011 13:20
To: zfs-discuss@opensolaris.org
Subject: Re: [zfs-discuss] ZFS and TRIM

So, the bottom line is that Solaris 11 Express can not use TRIM and SSD? Is
that the conclusion? So, it might not be a good idea to use a SSD?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS and TRIM

2011-02-04 Thread Pawel Jakub Dawidek
On Sat, Jan 29, 2011 at 11:31:59AM -0500, Edward Ned Harvey wrote:
 What is the status of ZFS support for TRIM?
[...]

I've no idea, but because I wanted to add such support for FreeBSD/ZFS
for a while now, I'll share my thoughts.

The problem is where to put those operations. ZFS internally have
ZIO_TYPE_FREE request, which represents exactly what we need - offset
and size to free. It would be best to just pass those requests directly
to VDEVs, but we can't do that. There might be transaction group that
will never be committed, because of a power failure and we TRIMed blocks
that we want to use after boot.
Ok, maybe we could just make such operation part of the transaction
group? No, we can't do that too. If we start committing transactions and
we execute TRIM operations we may still have power failure and TRIM
operations on old blocks cannot be undone, so we will get back to
invalid data.

So why not to move TRIM operations to the next transaction group? That's
doable, although we still need to be careful not to TRIM blocks that
were freed in the previous transaction group, but are reallocated in the
current one (or if we TRIM, we TRIM first and then write). Unfortunately
we don't want to TRIM blocks immediately. Take into account disks that
are lying about cache flush operation and because of that ZFS tries to
keep freed blocks from the few last transaction groups around, so you
can forcibly rewind to one of the previous txgs if such corruption occur.

My initial idea was to implement 100% reliable TRIM, so that I can
implement secure delete using it, eg. if ZFS is placed on top of disk
encryption layer, I can implement TRIM in this layer as 'overwrite the
given range with random data'. Making TRIM 100% reliable will be very
hard, IMHO.  But in most cases we don't need TRIM to be so perfect. My
current idea is to delay TRIM operation for some number of transaction
groups.  For example if block is freed in txg=5, I'll send TRIM for it
after txg=15 (if it wasn't reassigned in the meantime).  This is ok if
we crash before we get to txg=15, because the only side-effect is that
next write to this range might be a little slower.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
p...@freebsd.org   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgpd4hVRMkn1v.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS and TRIM

2011-02-04 Thread Christopher George
 So, the bottom line is that Solaris 11 Express can not use 
 TRIM and SSD?

Correct.

 So, it might not be a good idea to use a SSD? 

It is true that a Flash based SSD, will be adversely impacted by 
ZFS not supporting TRIM, especially for the ZIL accelerator.

But a DRAM based SSD is immune to TRIM support status and 
thus unaffected.  Actually, TRIM support would only add 
unnecessary overhead to the DDRdrive X1's device driver.

Best regards,

Christopher George
Founder/CTO
www.ddrdrive.com
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS and TRIM

2011-02-04 Thread David Magda
On Fri, February 4, 2011 09:30, Pawel Jakub Dawidek wrote:
 But in most cases we don't need TRIM to be so perfect. My
 current idea is to delay TRIM operation for some number of transaction
 groups.  For example if block is freed in txg=5, I'll send TRIM for it
 after txg=15 (if it wasn't reassigned in the meantime).  This is ok if
 we crash before we get to txg=15, because the only side-effect is that
 next write to this range might be a little slower.

Off the top of my head, I can then of two instances where non-recent
blocks would be needed:
* snapshots
* importing via recovery mode (zpool import -F mypool)

For the latter, given that each vdev label can have up to 128 uberblocks, 
recovery mode import can go back at least 128 transactions for a single
non-mirrored device, so you'd potentially need to not TRIM at least 128
transactions back for the worst case.

Of course if you have a pair of mirrored vdevs/disks, and each one has 128
uberblocks, that's potentially 256 txgs that you can recover from (and it
goes up the more vdevs you have of course). That may be excessive, but
perhaps there could be a tunable sysctl on a max amount to go back TRIMing
(defaulting to 128? 64? 32?).

I'm not sure how ZFS keeps track of snapshots: is there something
in-memory, or is it necessary to walk the tree? Perhaps getting a list of
snapshots, getting the oldest birth time (i.e., smallest txg), and TRIMing
and blocks that have one less than that number? Given that txgs are
committed every 5-30s, and I/O isn't done between them, that idle time
could be utilized for sending TRIM commands?

Presumably the Oracle folks are looking at this as well internally.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS and TRIM

2011-02-04 Thread Erik Trimble

On 2/4/2011 7:39 AM, Christopher George wrote:

So, the bottom line is that Solaris 11 Express can not use
TRIM and SSD?

Correct.


So, it might not be a good idea to use a SSD?

It is true that a Flash based SSD, will be adversely impacted by
ZFS not supporting TRIM, especially for the ZIL accelerator.

But a DRAM based SSD is immune to TRIM support status and
thus unaffected.  Actually, TRIM support would only add
unnecessary overhead to the DDRdrive X1's device driver.

Best regards,

Christopher George
Founder/CTO
www.ddrdrive.com


Bottom line here is this:  for a ZIL, you have a hierarchy of 
performance, each about two orders of magnitude faster than the prior:


1. hard drive
2. NAND-based SSD
3. DRAM-based SSD


You'll still get a very noticeable improvement of using a NAND (flash) 
SSD over not using a dedicated ZIL device.  It just won't be the 
improvement promised by the SSD packaging.


If that performance isn't sufficient for you, then a DRAM SSD is your 
best bet.



Note that even if TRIM would be supported, it wouldn't remove the whole 
penalty that a fully-written-to NAND SSD suffers. NAND requires that any 
block which was priorly written to be erased BEFORE you can write to it 
again.  TRIM only helps with using unwritten blocks inside pages, and to 
schedule whole page erasures inside the SSD controller.   I can't put 
real numbers on it, but I would suspect that rather than suffer a 10x 
loss of performance, you might only lose 5x or so if TRIM were properly 
usable.


--

Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS and L2ARC memory requirements?

2011-02-04 Thread Roy Sigurd Karlsbakk
 100TB storage? Cool! What is the hardware? How many discs? Gief me ze
 hardware! :oP

 100TB storage? Cool! What is the hardware? How many discs? Gief me ze 
 hardware! :oP

We have two 100TB boxes running as Bacula storage agents, each a supermicro box 
with 40 disks (2 2,5 500GB drives internally for the rpool, 2 SSDs for SLOG). 
The remaining 35 slots are used for two L2ARC devices (Micron, former Cruzial 
RealSSD C300). The drives are WD Black, and they seem fairly stable, but we 
have gotten some bad iostat from some of them (though, those are returned). The 
storage consists mainly of 11 VDEVs, 7 2TB drives each, in RAIDz2.

Vennlige hilsener / Best regards

roy
--
Roy Sigurd Karlsbakk
(+47) 97542685
r...@karlsbakk.net
http://blogg.karlsbakk.net/
--
I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er 
et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av 
idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og 
relevante synonymer på norsk.



Vennlige hilsener / Best regards

roy
--
Roy Sigurd Karlsbakk
(+47) 97542685
r...@karlsbakk.net
http://blogg.karlsbakk.net/
--
I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er 
et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av 
idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og 
relevante synonymer på norsk.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS and L2ARC memory requirements?

2011-02-04 Thread Roy Sigurd Karlsbakk
 We have two 100TB boxes running as Bacula storage agents, each a
 supermicro box with 40 disks (2 2,5 500GB drives internally for the
 rpool, 2 SSDs for SLOG). The remaining 35 slots are used for two L2ARC
 devices (Micron, former Cruzial RealSSD C300). The drives are WD
 Black, and they seem fairly stable, but we have gotten some bad iostat
 from some of them (though, those are returned). The storage consists
 mainly of 11 VDEVs, 7 2TB drives each, in RAIDz2.

Add a 45 drive JBOD to that to make the math work. The JBOD is a similar 
supermicro box, but without the mobo, attached with SAS.

Vennlige hilsener / Best regards

roy
--
Roy Sigurd Karlsbakk
(+47) 97542685
r...@karlsbakk.net
http://blogg.karlsbakk.net/
--
I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er 
et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av 
idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og 
relevante synonymer på norsk.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss