Re: [zfs-discuss] ZFS and TRIM
So, the bottom line is that Solaris 11 Express can not use TRIM and SSD? Is that the conclusion? So, it might not be a good idea to use a SSD? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS and L2ARC memory requirements?
100TB storage? Cool! What is the hardware? How many discs? Gief me ze hardware! :oP -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS and TRIM
Even without TRIM and lots of use, SSD are still likely to help as ZIL and L2ARC cache units better than spindle disks, however don't expect the same performance you got after a fresh wipe/install. It makes sense to go with the brands with the best garbage collector you can and also if you can leave more space unused, that helps. Bye, Deano -Original Message- From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-boun...@opensolaris.org] On Behalf Of Orvar Korvar Sent: 04 February 2011 13:20 To: zfs-discuss@opensolaris.org Subject: Re: [zfs-discuss] ZFS and TRIM So, the bottom line is that Solaris 11 Express can not use TRIM and SSD? Is that the conclusion? So, it might not be a good idea to use a SSD? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS and TRIM
On Sat, Jan 29, 2011 at 11:31:59AM -0500, Edward Ned Harvey wrote: What is the status of ZFS support for TRIM? [...] I've no idea, but because I wanted to add such support for FreeBSD/ZFS for a while now, I'll share my thoughts. The problem is where to put those operations. ZFS internally have ZIO_TYPE_FREE request, which represents exactly what we need - offset and size to free. It would be best to just pass those requests directly to VDEVs, but we can't do that. There might be transaction group that will never be committed, because of a power failure and we TRIMed blocks that we want to use after boot. Ok, maybe we could just make such operation part of the transaction group? No, we can't do that too. If we start committing transactions and we execute TRIM operations we may still have power failure and TRIM operations on old blocks cannot be undone, so we will get back to invalid data. So why not to move TRIM operations to the next transaction group? That's doable, although we still need to be careful not to TRIM blocks that were freed in the previous transaction group, but are reallocated in the current one (or if we TRIM, we TRIM first and then write). Unfortunately we don't want to TRIM blocks immediately. Take into account disks that are lying about cache flush operation and because of that ZFS tries to keep freed blocks from the few last transaction groups around, so you can forcibly rewind to one of the previous txgs if such corruption occur. My initial idea was to implement 100% reliable TRIM, so that I can implement secure delete using it, eg. if ZFS is placed on top of disk encryption layer, I can implement TRIM in this layer as 'overwrite the given range with random data'. Making TRIM 100% reliable will be very hard, IMHO. But in most cases we don't need TRIM to be so perfect. My current idea is to delay TRIM operation for some number of transaction groups. For example if block is freed in txg=5, I'll send TRIM for it after txg=15 (if it wasn't reassigned in the meantime). This is ok if we crash before we get to txg=15, because the only side-effect is that next write to this range might be a little slower. -- Pawel Jakub Dawidek http://www.wheelsystems.com p...@freebsd.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgpd4hVRMkn1v.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS and TRIM
So, the bottom line is that Solaris 11 Express can not use TRIM and SSD? Correct. So, it might not be a good idea to use a SSD? It is true that a Flash based SSD, will be adversely impacted by ZFS not supporting TRIM, especially for the ZIL accelerator. But a DRAM based SSD is immune to TRIM support status and thus unaffected. Actually, TRIM support would only add unnecessary overhead to the DDRdrive X1's device driver. Best regards, Christopher George Founder/CTO www.ddrdrive.com -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS and TRIM
On Fri, February 4, 2011 09:30, Pawel Jakub Dawidek wrote: But in most cases we don't need TRIM to be so perfect. My current idea is to delay TRIM operation for some number of transaction groups. For example if block is freed in txg=5, I'll send TRIM for it after txg=15 (if it wasn't reassigned in the meantime). This is ok if we crash before we get to txg=15, because the only side-effect is that next write to this range might be a little slower. Off the top of my head, I can then of two instances where non-recent blocks would be needed: * snapshots * importing via recovery mode (zpool import -F mypool) For the latter, given that each vdev label can have up to 128 uberblocks, recovery mode import can go back at least 128 transactions for a single non-mirrored device, so you'd potentially need to not TRIM at least 128 transactions back for the worst case. Of course if you have a pair of mirrored vdevs/disks, and each one has 128 uberblocks, that's potentially 256 txgs that you can recover from (and it goes up the more vdevs you have of course). That may be excessive, but perhaps there could be a tunable sysctl on a max amount to go back TRIMing (defaulting to 128? 64? 32?). I'm not sure how ZFS keeps track of snapshots: is there something in-memory, or is it necessary to walk the tree? Perhaps getting a list of snapshots, getting the oldest birth time (i.e., smallest txg), and TRIMing and blocks that have one less than that number? Given that txgs are committed every 5-30s, and I/O isn't done between them, that idle time could be utilized for sending TRIM commands? Presumably the Oracle folks are looking at this as well internally. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS and TRIM
On 2/4/2011 7:39 AM, Christopher George wrote: So, the bottom line is that Solaris 11 Express can not use TRIM and SSD? Correct. So, it might not be a good idea to use a SSD? It is true that a Flash based SSD, will be adversely impacted by ZFS not supporting TRIM, especially for the ZIL accelerator. But a DRAM based SSD is immune to TRIM support status and thus unaffected. Actually, TRIM support would only add unnecessary overhead to the DDRdrive X1's device driver. Best regards, Christopher George Founder/CTO www.ddrdrive.com Bottom line here is this: for a ZIL, you have a hierarchy of performance, each about two orders of magnitude faster than the prior: 1. hard drive 2. NAND-based SSD 3. DRAM-based SSD You'll still get a very noticeable improvement of using a NAND (flash) SSD over not using a dedicated ZIL device. It just won't be the improvement promised by the SSD packaging. If that performance isn't sufficient for you, then a DRAM SSD is your best bet. Note that even if TRIM would be supported, it wouldn't remove the whole penalty that a fully-written-to NAND SSD suffers. NAND requires that any block which was priorly written to be erased BEFORE you can write to it again. TRIM only helps with using unwritten blocks inside pages, and to schedule whole page erasures inside the SSD controller. I can't put real numbers on it, but I would suspect that rather than suffer a 10x loss of performance, you might only lose 5x or so if TRIM were properly usable. -- Erik Trimble Java System Support Mailstop: usca22-123 Phone: x17195 Santa Clara, CA ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS and L2ARC memory requirements?
100TB storage? Cool! What is the hardware? How many discs? Gief me ze hardware! :oP 100TB storage? Cool! What is the hardware? How many discs? Gief me ze hardware! :oP We have two 100TB boxes running as Bacula storage agents, each a supermicro box with 40 disks (2 2,5 500GB drives internally for the rpool, 2 SSDs for SLOG). The remaining 35 slots are used for two L2ARC devices (Micron, former Cruzial RealSSD C300). The drives are WD Black, and they seem fairly stable, but we have gotten some bad iostat from some of them (though, those are returned). The storage consists mainly of 11 VDEVs, 7 2TB drives each, in RAIDz2. Vennlige hilsener / Best regards roy -- Roy Sigurd Karlsbakk (+47) 97542685 r...@karlsbakk.net http://blogg.karlsbakk.net/ -- I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og relevante synonymer på norsk. Vennlige hilsener / Best regards roy -- Roy Sigurd Karlsbakk (+47) 97542685 r...@karlsbakk.net http://blogg.karlsbakk.net/ -- I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og relevante synonymer på norsk. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS and L2ARC memory requirements?
We have two 100TB boxes running as Bacula storage agents, each a supermicro box with 40 disks (2 2,5 500GB drives internally for the rpool, 2 SSDs for SLOG). The remaining 35 slots are used for two L2ARC devices (Micron, former Cruzial RealSSD C300). The drives are WD Black, and they seem fairly stable, but we have gotten some bad iostat from some of them (though, those are returned). The storage consists mainly of 11 VDEVs, 7 2TB drives each, in RAIDz2. Add a 45 drive JBOD to that to make the math work. The JBOD is a similar supermicro box, but without the mobo, attached with SAS. Vennlige hilsener / Best regards roy -- Roy Sigurd Karlsbakk (+47) 97542685 r...@karlsbakk.net http://blogg.karlsbakk.net/ -- I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og relevante synonymer på norsk. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss