Re: [zfs-discuss] what have you been buying for slog and l2arc?
It depends on the model. Consumer models are less likely to immediately flush. My understanding that this is done in part to do some write coalescing and reduce the number of P/E cycles. Enterprise models should either flush, or contain a super capacitor that provides enough power for the drive to complete writing any date in its buffer. My Home Fusion SSD runs on banana peels and eggshells and uses a Flux Capacitor. I've never had a failure. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] what have you been buying for slog and l2arc?
On 08/07/2012 02:18 AM, Christopher George wrote: I mean this as constructive criticism, not as angry bickering. I totally respect you guys doing your own thing. Thanks, I'll try my best to address your comments... Thanks for your kind reply, though there are some points I'd like to address, if that's okay. *) Increased capacity for high-volume applications. We do have a select number of customers striping two X1s for a total capacity of 8GB, but for a majority of our customers 4GB is perfect. Increasing capacity obviously increases the cost, so we wanted the baseline capacity to reflect a solution to most but not every need. Certainly, for most uses this isn't an issue. I just threw that in there, considering how cheap DRAM and flash is nowadays and how easy it is to create disk pools which push 2GB/s in write thoughput, I was hoping you guys would be keeping pace with that (getting 4GB of sync writes in the txg commit window can be tough, but not unthinkable). In any case, simply dismissing it by saying that simply get two, you are effectively doubling my slog costs which, considering the recommended practice is to get a slog mirror, would mean that I have to get 4 X1's. That's $8k in list prices and 8 full-height PCI-e slots wasted (seeing as how an X1 is wider than the standard PCI-e card). Not many systems can do that (that's why I said solder the DRAM and go low-profile). *) Remove the requirement to have an external UPS (couple of supercaps? microbattery?) Done! We will be formally introducing an optional DDRdrive SuperCap PowerPack at the upcoming OpenStorage Summit. Great! Though I suppose that will inflate the price even further (seeing as you used the word optional). *) Use cheaper MLC flash to lower cost - it's only written to in case of a power outage, anyway so lower write cycles aren't an issue and modern MLC is almost as fast as SLC at sequential IO (within 10% usually). We will be staying with SLC not only for performance but longevity/reliability. Check out the specifications (ie erase/program cycles and required ECC) for a modern 20 nm MLC chip and then let me know if this is where you *really* want to cut costs :) MLC is so much cheaper that you can simply slap on twice as much and use the rest for ECC, mirroring or simply overprovisioning sectors. The common practice to extending the lifecycle of MLC is by short-stroking it, i.e. using only a fraction of the capacity. E.g. a 40GB MLC unit with 5-10k cycles per cell can be turned into a 4GB unit (with the controller providing wear leveling) with effectively 50-100k cycles (that's SLC land) for about a hundred bucks. Also, since I'm mirroring it already with ZFS checksums to provide integrity checking, your argument simply doesn't hold up. Oh and don't count on Illumos missing support for SCSI Unmap or SATA TRIM forever. Work is underway to rectify this situation. *) PCI Express 3.0 interface (perhaps even x4) Our product is FPGA based and the PCIe capability is the biggest factor in determining component cost. When we introduced the X1, the FPGA cost *alone* to support just PCIe Gen2 x8 was greater than the current street price of the DDRdrive X1. I always had a bit of an issue with non-hotswappable storage systems. What if an X1 slog dies? I need to power the machine down, open it up, take out the slog, put it another one and power it back up. Since ZFS has slog removal support, there's no reason to go for non-hotpluggable slogs anyway. What about 6G SAS? Dual ported you could push around 12Gbit/s of bandwidth to/from the device, way more than the current 250MB/s, and get hotplug support in there too. *) At least updated benchmarks your site to compare against modern flash-based competition (not the Intel X25-E, which is seriously stone age by now...) I completely agree we need to refresh the website, not even the photos are representative of our shipping product (we now offer VLP DIMMs). We are engineers first and foremost, but an updated website is in the works. In the mean time, we have benchmarked against both the Intel 320/710 in my OpenStorage Summit 2011 presentation which can be found at: http://www.ddrdrive.com/zil_rw_revelation.pdf I always had a bit of an issue with your benchmarks. First off, you're only ever doing synthetics. They are very nice, but don't provide much in terms of real-world perspective. Try and compare on price too. Take something like a Dell R720, stick in the equivalent (in terms of cost!) of DRAM SSDs and Flash SSDs (i.e. for X1 you're looking at like 4 Intel 710s) and run some real workloads (database benchmarks, virtualization benchmarks, etc.). Experiment beats theory, every time. *) Lower price, lower price, lower price. I can get 3-4 200GB OCZ Talos-Rs for $2k FFS. That means I could equip my machine with one to two mirrored slogs and nearly 800GB worth of L2ARC for the price of a single X1. I
Re: [zfs-discuss] what have you been buying for slog and l2arc?
On Mon, 6 Aug 2012, Christopher George wrote: I mean this as constructive criticism, not as angry bickering. I totally respect you guys doing your own thing. Thanks, I'll try my best to address your comments... *) At least updated benchmarks your site to compare against modern flash-based competition (not the Intel X25-E, which is seriously stone age by now...) I completely agree we need to refresh the website, not even the photos are representative of our shipping product (we now offer VLP DIMMs). We are engineers first and foremost, but an updated website is in the works. In the mean time, we have benchmarked against both the Intel 320/710 in my OpenStorage Summit 2011 presentation which can be found at: http://www.ddrdrive.com/zil_rw_revelation.pdf Very impressive iops numbers. Although I have some thoughts on the benchmarking method itself. Imho the comparison shouldn't be raw iops numbers on the ddrdrive itself as tested with iometer (it's only 4gb), but real world numbers on a real world pool consisting of spinning disks with ddrdrive acting as zil accelerator. I just introduced an intel 320 120gb as zil accelerator for a simple zpool with two sas disks in raid0 configuration, and it's not as bad as in your presentation. It shows about 50% of the possible nfs ops with the ssd as zil versus no zil (sync=disabled on oi151), and about 6x-8x the performance compared to the pool without any accelerator and sync=standard. The case with no zil is the upper limit one can achieve on a given pool, in my case creation of about 750 small files/sec via nfs. With the ssd it's 380 files/sec (nfs stack is a limiting factor, too). Or about 2400 8k write iops with the ssd vs. 11900 iops with zil disabled, and 250 iops without accelerator (gnu dd with oflag=sync). Not bad at all. This could be just good enough for small businesses and moderate sized pools. Michael -- Michael Hase edition-software GmbH http://edition-software.de ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] what have you been buying for slog and l2arc?
On Tue, 7 Aug 2012, Sašo Kiselkov wrote: MLC is so much cheaper that you can simply slap on twice as much and use the rest for ECC, mirroring or simply overprovisioning sectors. The common practice to extending the lifecycle of MLC is by short-stroking it, i.e. using only a fraction of the capacity. E.g. a 40GB MLC unit with 5-10k cycles per cell can be turned into a 4GB unit (with the controller providing wear leveling) with effectively 50-100k cycles (that's SLC land) for about a hundred bucks. Also, since I'm mirroring it already with ZFS checksums to provide integrity checking, your argument simply doesn't hold up. Remember he also said that the current product is based principally on an FPGA. This FPGA must be interfacing directly with the Flash device so it would need to be substantially redesigned to deal with MLC Flash (probably at least an order of magnitude more complex), or else a microcontroller would need to be added to the design, and firmware would handle the substantial complexities. If the Flash device writes slower, then the power has to stay up longer. If the Flash device reads slower, then it takes longer for the drive to come back on line. Quite a lot of product would need to be sold in order to pay for both re-engineering and the cost of running a business. Regardless, continual product re-development is necessary or else it will surely die. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] what have you been buying for slog and l2arc?
On 08/07/2012 04:08 PM, Bob Friesenhahn wrote: On Tue, 7 Aug 2012, Sašo Kiselkov wrote: MLC is so much cheaper that you can simply slap on twice as much and use the rest for ECC, mirroring or simply overprovisioning sectors. The common practice to extending the lifecycle of MLC is by short-stroking it, i.e. using only a fraction of the capacity. E.g. a 40GB MLC unit with 5-10k cycles per cell can be turned into a 4GB unit (with the controller providing wear leveling) with effectively 50-100k cycles (that's SLC land) for about a hundred bucks. Also, since I'm mirroring it already with ZFS checksums to provide integrity checking, your argument simply doesn't hold up. Remember he also said that the current product is based principally on an FPGA. This FPGA must be interfacing directly with the Flash device so it would need to be substantially redesigned to deal with MLC Flash (probably at least an order of magnitude more complex), or else a microcontroller would need to be added to the design, and firmware would handle the substantial complexities. If the Flash device writes slower, then the power has to stay up longer. If the Flash device reads slower, then it takes longer for the drive to come back on line. Yeah, I know, but then, you can interface with an existing industry-standard flash controller, no need to design your own (reinvent the wheel). The choice of FPGA is good for some things, but flexibility in exchanging components certainly isn't one of them. If I were designing something akin to the X1, I'd go with a generic embedded CPU design (e.g. a PowerPC core) interfacing with standard flash components and running the primary front-end from the chip's on-board DRAM. I mean, just to give you some perspective, for $2k I could build a full computer with 8GB of mirrored ECC DRAM which interfaces via an off-the-shelf 6G SAS HBA (with two 4x wide 6G SAS ports) or perhaps even an InfiniBand adapter with RDMA with the host machine, includes a small SSD in it's SATA bay and a tiny UPS battery to run the whole thing for a few minutes while we write DRAM contents to flash in case of a power outage (the current X1 doesn't even include this in its base design). And that's something I could do with off-the-shelf components for less than $2k (probably a whole lot less) with a production volume of _1_. Quite a lot of product would need to be sold in order to pay for both re-engineering and the cost of running a business. Sure, that's why I said it's David vs. Goliath. However, let's be honest here, the X1 isn't a terribly complex product. It's quite literally a tiny computer with some DRAM and a feature to dump DRAM contents to Flash (and read it back later) in case power fails. That's it. Cheers, -- Saso ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] what have you been buying for slog and l2arc?
Very impressive iops numbers. Although I have some thoughts on the benchmarking method itself. Imho the comparison shouldn't be raw iops numbers on the ddrdrive itself as tested with iometer (it's only 4gb), The purpose of the benchmarks presented is to isolate the inherent capability of just the SSD in a simple/synthetic/sustained Iometer 4KB random write test. This test successfully illuminates a critical difference between a Flash only and a DRAM/SLC based SSD. Flash only SSD vendors are *less* than forthright in their marketing when specifying their 4KB random write capability. I am surprised vendors are not called out for marketing FOB (fresh out of the box) results (that even with TRIM support) are not sustainable. Intel was a notable exception until they also introduced SSDs based on SandForce controllers. In the section prior to the benchmarks, titled ZIL Accelerator access pattern random and/or sequential I show an example workload and how it translates to an actual log device's access pattern. It clearly shows a wide (21-71%) spectrum of random write accesses. So before even presenting any Iometer results, I don't believe I indicate or even imply that real world workloads will somehow be 100% 4KB random write based. For the record, I agree with you as they are obviously not! real world numbers on a real world pool consisting of spinning disks with ddrdrive acting as zil accelerator. Benchmarking is frustrating for us also, as what is a real world pool? And if we picked one to benchmark, how relevant would it be to others? 1) number of vdevs (we see anywhere from one to massive) 2) vdev configuration (only mirrored pairs to 12 disk raidz2) 3) HDD type (low rpm green HDDs to SSD only pools) 4) host memory size (we see not enough to 192GB+) 5) number of host CPUs (you get the picture) 6) network connection (1GB to multiple 10GB) 7) number of network ports 8) direct connect to client or through a switch(s) Is the ZFS pool accessed using NFS or iSCSI? What is the client OS? What is the client configuration? What is the workload composition (read/async write/sync write)? What is the workload access pattern (sequential/random)? ... This could be just good enough for small businesses and moderate sized pools. No doubt, we are also very clear on who we target (enterprise customers). The beauty of ZFS is the flexibility of it's implementation. By supporting multiple log device types and configurations it ultimately enables a broad range of performance capabilities! Best regards, Chris -- Christopher George cgeorge at ddrdrive.com http://www.ddrdrive.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss