Re: [zfs-discuss] preview of new SSD based on SandForce controller

Erik Trimble Sat, 02 Jan 2010 03:09:30 -0800

Eric D. Mudama wrote:

On Fri, Jan  1 at 21:21, Erik Trimble wrote:
That all said, it certainly would be really nice to get a SSDcontroller which can really push the bandwidth, and the only way Isee this happening now is to go the "stupid" route, and dumb down thecontroller as much as possible. I really think we just want thecontroller to Do What I Say, and not try any optimizations or such.There's simply much more benefit to doing the optimization up at thefilesystem level than down at the device level. For a trivial case,consider the dreaded "write-read-write" problem of MLCs: to write asingle bit, a whole page has to be read, then the page recomposedwith the changed bits, before writing again. If the filesystem wasaware that the drive had this kind of issue, then in-RAM cachingwould almost always allow for the avoidance of the first "read"cycle, and performance goes back to a typical Copy-on-Write stylestripe write.
I am not convinced that a general purpose CPU, running other software
in parallel, will be able to be timely and responsive enough to
maximize bandwidth in an SSD controller without specialized hardware
support.  This hardware support is, of course, the controller that
exists on modern SSDs.

Why not? My argument is the one that ZFS as a whole is founded on:that modern CPUs have so many spare cycles that it's silly to pay extrafor a smart raid controller when we can just borrow time on the mainCPU. It seems to work out just fine for hard drives, so why not forSSDs (which, while much faster than HDs, are still many orders ofmagnitude slower than DMA transfers)?

Drive vendors abstracted these interfaces a long time ago, creating
Integrated Drive Electronics (IDE).  Bringing all of that logic back
up into the CPU would likely not help meaningfully.  Yes, it would
likely be cheaper, but I doubt it would be faster or more reliable.

I'm not advocating a return to something like the old IPI technology (ohboy, did I just date myself there...). That's silly. By "dumb", I'mreferring to things on the level with IDE - a disk controller thathandles nothing more than internal (to the disk) bad block remapping,LBA to actual block mapping, etc. In the case of a "stupid" SSDcontroller, that would entail sufficient smarts to do wear leveling,LBA, bad page/block marking/detection, and very little else.

I also am not convinced that your described RMW semantics are used in
any modern NAND devices.  Those problems were solved years ago.  The
granularity of the implementation has implications on performance in
some workloads, but I believe only those old JMicron-based SSDs did
block-level RMW, and hence wound up doing about ~2-3 IOPS in random
workloads with MLC drives.

ALL modern MLC-based SSDs have exactly the problem I've described as theexample above. It's a defining characteristic of the Multi-Level Celldesign. A nice modern Intel X25-M can see a loss of 50-80% of it'stheoretical maximum write performance once it runs out of unused cellsto write to. And that's with the fancy firmware.

SSDs (with good controllers) really strut their stuff when in-RAM
caching isn't working anyway.  If in-RAM was good enough, then why
bother with SSD?  Just have a spun-down rotating drive at 1/5th-1/15th
the cost.

--eric

No, they don't (strut, that is). MLC-based SSDs (and, even SLC-basedones, to a lesser extent) have a very significant write penalty. Muchof the "smarts" that does into current-gen SSDs is an attempt toovercome this design limitation. What Bob and I are saying is thatlocating the "smarts" in the SSD controller is misguided. Having thisintelligence located in the OS/Filesystem driver is a far better idea,as the system has a much more global understanding as to whereoptimizations can occur, and make the appropriate choices. And,frankly, it's far easier to update a filesystem driver than it is toreflash firmware on an SSD, should any changes be necessary.

The example I was giving for R-M-W is that it is /highly/ likely thatthe OS already has a significant bunch of a file to be modified ALREADYin the buffer cache (L2ARC, in ZFS's case). So, if ZFS is talking to astupid SSD, it knows that it cannot just issue a single block writeshould a bit in the file change. Instead, ZFS will know that it shouldissue a TRIM command (or something like it) to have the SSD mark the oldpage (where the bit(s) change) deleted, and then use that page from theL2ARC as the template to build a full page with the new bit(s) in it,and then issue a full page write to the SSD. This avoids having the SSDitself having to do this. Worst case scenario is that the SSD will haveto read the whole page to get it back into the L2ARC, but typical-usecase is much higher likelihood. So, typical case is 1 IOPS vs 3 IOPSon a "smart" SSD. My approach uses more interface (SAS/SATA/etc)bandwidth to the SSD, but that's OK, since there's plenty to spare.And, going back to the article where they're speculating havingsomething like RAID and Dedup integrated at the SSD controller-level,this is just a /really/ bad idea.

--
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA
Timezone: US/Pacific (GMT-0800)

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] preview of new SSD based on SandForce controller

Reply via email to