Re: [zfs-discuss] slog writing patterns vs SSD tech. (was SSD's and ZFS...)
Bob Friesenhahn wrote: Of course, it is my understanding that the zfs slog is written sequentially so perhaps this applies instead: Actually, reading up on these drives I've started to wonder about the slog writing pattern. While these drives do seem to do a great job at random writes, most of the promise shows at sequential writes, so Does the slog attempt to write sequentially through the space given to it? Also there are all sorts of analysis out there about how the drives always attempt to write new data to the pages and blocks they know are empty since they can't overwrite one page (usually 4k) without erasing the whole (512k) block the page is in. This leads to a drop in write performance after all the space (both the space you paid for, and any extra space the vendor putin to work around this issue) has been used once. This shows up in regular filesystems because when a file is deleted the drive only sees a new (over)write of some meta-data so the OS can record that the file is gone, but the drive is never told that the blocks the file was occupying are now free and can be pre-erased at the drives convience. The Drive vendors have come up with a new TRIM command, which some OS's (Win7) are talking about supporting in their Filesystems. Obviously for use only as an sLog device ZFS itself doesn't need (until people start using SSD's as regular pool devices) to know how to use TRIM, but I would think that the slog code would need to use it in order to keep write speeds up and latencies down. No? If so, what's the current concensus, thoughts, plans, etc. on if and when TRIM will be usable in Solaris/ZFS? -Kyle ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] slog writing patterns vs SSD tech.
km == Kyle McDonald kmcdon...@egenera.com writes: km hese drives do seem to do a great job at random writes, most km of the promise shows at sequential writes, so Does the slog km attempt to write sequentially through the space given to it? when writing to the slog, some user-visible application has to wait for the slog to return that the write was committed to disk. so whether it's random or not, you're waiting for it, and io/s translates closely into latency because the writes cannot be batched with normal level of aggressiveness (or they should jsut go to the main pool which will eventually have to handle the entire workload anyway). io/s is the number that matters. ``but but but but!'' thwack NO! Everyone who is using the code, writing the code, and building the systems says, io/s is the number that matters. If you've got some experience otherwise, fine, odd things turn up all the time. but AFAICT the consensus is clear right now. km they can't overwrite one page (usually 4k) without erasing the km whole (512k) block the page is in. don't presume to get into the business of their black box so far. That's almost certainly not what they do. They probably do COW like ZFS and (yaffs and jffs2 and ubifs), so they will do the 4k writes to partly-empty pages until the page is full. In the background a gc thread will evacuate and rewrite pages that have become spattered with unreferenced sectors. They will write to the flash filesystem to keep track of things about itself, like half-erased cels, toasted cels, per-cel erase counts. Then there is probably a defragmenter thread, or else the gc is itself data-reorganizing. And there is some lookup state kept in DRAM during operation, and reconstructed from post-mortem observation of what's in the FLASH at boot, like with any filesystem. Just look at the observed performance on microbenchmarks or in actual use rather than trying to reverse-reason about these fancy and otherwise-unobtainable closed-source filesystems, whichis what they are really selling, in disk/``game cartridge'' form factor. km The Drive vendors have come up with a new TRIM command, which km some OS's (Win7) are talking about supporting in their km Filesystems. this would be useful for VM's with thin-provisioned disks, too. km I would think that the slog code would need to use it in order km to keep write speeds up and latencies down. No? read the goofy gamer site review please. No, not with the latest intel firmware, it's not needed. pgpSAHGwqAg1s.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] slog writing patterns vs SSD tech. (was SSD's and ZFS...)
On Jul 24, 2009, at 10:46 AM, Kyle McDonald wrote: Bob Friesenhahn wrote: Of course, it is my understanding that the zfs slog is written sequentially so perhaps this applies instead: Actually, reading up on these drives I've started to wonder about the slog writing pattern. While these drives do seem to do a great job at random writes, most of the promise shows at sequential writes, so Does the slog attempt to write sequentially through the space given to it? Short answer is yes. But you can measure it with iopattern. http://www.richardelling.com/Home/scripts-and-programs-1/iopattern use the -d option to look at your slog device. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] slog writing patterns vs SSD tech.
Miles Nordin wrote: km == Kyle McDonald kmcdon...@egenera.com writes: km hese drives do seem to do a great job at random writes, most km of the promise shows at sequential writes, so Does the slog km attempt to write sequentially through the space given to it? thwack NO! Everyone who is using the code, writing the code, and building the systems says, io/s is the number that matters. If you've got some experience otherwise, fine, odd things turn up all the time. but AFAICT the consensus is clear right now. Yeah I know. I get it. I screwed up and used the the wrong term. OK? I agree with you. Still when all the previously erased pages are gone, write latencies go up (drastically - in some cases worse than a spinning HD,) and io/s goes down. So what I really wanted to get into was the question below. km they can't overwrite one page (usually 4k) without erasing the km whole (512k) block the page is in. don't presume to get into the business of their black box so far. I'm not. Guys like this are: http://www.anandtech.com/storage/showdoc.aspx?i=3531p=8 That's almost certainly not what they do. They probably do COW like ZFS and (yaffs and jffs2 and ubifs), so they will do the 4k writes to partly-empty pages until the page is full. In the background a gc thread will evacuate and rewrite pages that have become spattered with unreferenced sectors. That's where the problem comes in. They have no knowledge of the upper filesystem, and don't know what previously written blocks are still referenced. When the OS FS rewrites a directory to remove a pointer to the string of blocks the file used to use, and updates it's list of which LBA sectors are now free vs. in use, it probably happens pretty much exactly like you say. But that doesn't let the SSD mark the sectors the file used as unreferenced, so the gc thread can't evacuate them ahead of time and add them to the empty page pool. km The Drive vendors have come up with a new TRIM command, which km some OS's (Win7) are talking about supporting in their km Filesystems. this would be useful for VM's with thin-provisioned disks, too. True. Keeping or Putting the 'holes' back in the 'holey' disk files when the VM frees up space would be very useful. km I would think that the slog code would need to use it in order km to keep write speeds up and latencies down. No? read the goofy gamer site review please. No, not with the latest intel firmware, it's not needed. I did read at least one review that compared old and new firmware on the Intel M model. In that I'm pretty sure they still saw a performance hit (in latency) when the entire drive had been written to. It may have taken longer to hit, and it may have not been as drastic but it was still there. Which review are you talking about? So what if Intel has fixed it. Not everyone is going to use the intel drives. If the TRIM command (assuming it can help at all) can keep the other brands and models performing close to how they performed when new, then I'd say it's useful in the ZFS slogs too - Just because one vendor might have made it unnecessary, doesn't mean it is for everyone. Does it? -Kyle ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] slog writing patterns vs SSD tech.
On Fri, 24 Jul 2009, Kyle McDonald wrote: http://www.anandtech.com/storage/showdoc.aspx?i=3531p=8 This an interesting test report. Something quite interesting for zfs is if the write rate is continually high, then the write performance will be limited by the FLASH erase performance, regardless of the use of something like TRIM. TRIM only improves write latency in the case that the FLASH erase is able to keep ahead of the write rate. If the writes are bottlenecked, then using TRIM is likely to decrease the write performance. If data is written at an almost constant rate, then a time may come where the drive suddenly hits the wall and is no longer able to erase the data as fast as it comes in. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] slog writing patterns vs SSD tech.
On Jul 24, 2009, at 2:33 PM, Bob Friesenhahn wrote: On Fri, 24 Jul 2009, Kyle McDonald wrote: http://www.anandtech.com/storage/showdoc.aspx?i=3531p=8 This an interesting test report. Something quite interesting for zfs is if the write rate is continually high, then the write performance will be limited by the FLASH erase performance, regardless of the use of something like TRIM. TRIM only improves write latency in the case that the FLASH erase is able to keep ahead of the write rate. If the writes are bottlenecked, then using TRIM is likely to decrease the write performance. If data is written at an almost constant rate, then a time may come where the drive suddenly hits the wall and is no longer able to erase the data as fast as it comes in. Yep. Good thing we can zfs add a log to spread the load. NB zfs add log != zfs attach log -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss