Re: [zfs-discuss] what have you been buying for slog and l2arc?
Very impressive iops numbers. Although I have some thoughts on the benchmarking method itself. Imho the comparison shouldn't be raw iops numbers on the ddrdrive itself as tested with iometer (it's only 4gb), The purpose of the benchmarks presented is to isolate the inherent capability of just the SSD in a simple/synthetic/sustained Iometer 4KB random write test. This test successfully illuminates a critical difference between a Flash only and a DRAM/SLC based SSD. Flash only SSD vendors are *less* than forthright in their marketing when specifying their 4KB random write capability. I am surprised vendors are not called out for marketing FOB (fresh out of the box) results (that even with TRIM support) are not sustainable. Intel was a notable exception until they also introduced SSDs based on SandForce controllers. In the section prior to the benchmarks, titled ZIL Accelerator access pattern random and/or sequential I show an example workload and how it translates to an actual log device's access pattern. It clearly shows a wide (21-71%) spectrum of random write accesses. So before even presenting any Iometer results, I don't believe I indicate or even imply that real world workloads will somehow be 100% 4KB random write based. For the record, I agree with you as they are obviously not! real world numbers on a real world pool consisting of spinning disks with ddrdrive acting as zil accelerator. Benchmarking is frustrating for us also, as what is a real world pool? And if we picked one to benchmark, how relevant would it be to others? 1) number of vdevs (we see anywhere from one to massive) 2) vdev configuration (only mirrored pairs to 12 disk raidz2) 3) HDD type (low rpm green HDDs to SSD only pools) 4) host memory size (we see not enough to 192GB+) 5) number of host CPUs (you get the picture) 6) network connection (1GB to multiple 10GB) 7) number of network ports 8) direct connect to client or through a switch(s) Is the ZFS pool accessed using NFS or iSCSI? What is the client OS? What is the client configuration? What is the workload composition (read/async write/sync write)? What is the workload access pattern (sequential/random)? ... This could be just good enough for small businesses and moderate sized pools. No doubt, we are also very clear on who we target (enterprise customers). The beauty of ZFS is the flexibility of it's implementation. By supporting multiple log device types and configurations it ultimately enables a broad range of performance capabilities! Best regards, Chris -- Christopher George cgeorge at ddrdrive.com http://www.ddrdrive.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] what have you been buying for slog and l2arc?
Are people getting intel 330's for l2arc and 520's for slog? Unfortunately, the Intel 520 does *not* power protect it's on-board volatile cache (unlike the Intel 320/710 SSD). Intel has an eye-opening technology brief, describing the benefits of power-loss data protection at: http://www.intel.com/content/www/us/en/solid-state-drives/ssd-320-series-power-loss-data-protection-brief.html Intel's brief also clears up a prior controversy of what types of data are actually cached, per the brief it's both user and system data! Best regards, Christopher George www.ddrdrive.com *** The Intel 311 (SLC NAND) also fails to support on-board power protection. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] what have you been buying for slog and l2arc?
Is your DDRdrive product still supported and moving? Yes, we now exclusively target ZIL acceleration. We will be at the upcoming OpenStorage Summit 2012, and encourage those attending to stop by our booth and say hello :-) http://www.openstoragesummit.org/ Is it well supported for Illumos? Yes! Customers using Illumos derived distros make-up a good portion of our customer base. Thanks, Christopher George www.ddrdrive.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] what have you been buying for slog and l2arc?
I am glad to hear that both user AND system data is stored. That is rather reassuring. :-) I agree! --- [Excerpt from the linked Intel Technology Brief] What Type of Data is Protected: During an unsafe shutdown, firmware routines in the Intel SSD 320 Series respond to power loss interrupt and make sure both user data and system data in the temporary buffers are transferred to the NAND media. --- I was taking user data to indicate actual txg data and system data to mean the SSD's internal meta data... I'm curious, any other interpretations? Thanks, Chris Christopher George cgeorge at ddrdrive.com http://www.ddrdrive.com/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Separate Log Devices
The guide suggests that the zil be sized to 1/2 the amount of ram in the server which would be 1GB. The ZFS Best Practices Guide does detail the absolute maximum size the ZIL can grow in theory, which as you stated is 1/2 the size of the host's physical memory. But in practice, the very next bullet point details the log device sizing equation which we have found to be a more relevant indicator. Excerpt below: For a target throughput of X MB/sec and given that ZFS pushes transaction groups every 5 seconds (and have 2 outstanding), we also expect the ZIL to not grow beyond X MB/sec * 10 sec. So to service 100MB/sec of synchronous writes, 1 GB of log device should be sufficient. What happens if I oversize the zil? Oversizing the log device capacity has no negative repercussions other than the under utilization of your SSD. If I create a 1GB slice for the zil, can I add another slice for another zil in the future when more ram is added? If the question is if multiple disk slices can be striped to aggregate capacity, then the answer is yes. Be aware with most SSDs, including the Intel X25-E, using a disk slice instead of the entire device will automatically disable the on-board write cache. Christopher George Founder / CTO http://www.ddrdrive.com/ -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS and TRIM
So, the bottom line is that Solaris 11 Express can not use TRIM and SSD? Correct. So, it might not be a good idea to use a SSD? It is true that a Flash based SSD, will be adversely impacted by ZFS not supporting TRIM, especially for the ZIL accelerator. But a DRAM based SSD is immune to TRIM support status and thus unaffected. Actually, TRIM support would only add unnecessary overhead to the DDRdrive X1's device driver. Best regards, Christopher George Founder/CTO www.ddrdrive.com -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Lower latency ZIL Option?: SSD behind Controller BB Write Cache
ZIL OPTIONS: Obviously a DDRdrive is the ideal (36k 4k random IOPS***) but for the same budget I can get 2x Vertex 2 EX 50GB drives and put each behind it’s own P410 512MB BBWC controller. The Vertex 2 EX goes for approximately $900 each online, while the P410/512 BBWC is listed at HP for $449 each. Cost wise you should contact us for a quote, as we are price competitive with just a single SSD/HBA combination. Especially, as one obtains 4GB instead of 512MB of ZIL accelerator capacity. Assuming the SSDs can do 6300 4k random IOPS*** and that the controller cache confirms those writes in the same latency as the For 4KB random writes you need to look closely at slides 47/48 of the referenced presentation (http://www.ddrdrive.com/zil_accelerator). The 6443 IOPS is obtained after testing for *only* 2 hours post unpackaging or secure erase. The slope of both curves gives a hint, as the Vertex 2 EX does not level off and will continue to decrease. I am working on a new presentation focusing on this very fact for random write IOPS performance over time (life of the device). Suffice to say, 6443 IOPS is *not* worst case performance for random writes on the Vertex 2 EX. DDRdrive (both PCIe attached RAM?) then we should have DDRdrive type latency up to 6300 sustained IOPS. All tests used a QD (Queue Depth) of 32 which will hide the device latency of a single IO. Very meaningful, as real life workloads can be bound by even a single outstanding IO. Let's trace the latency to determine which has the advantage. For the SSD/HBA combination an IO has to run the gauntlet through two controllers (HBA and SSD) and propagate over a SATA cable. The DDRdrive X1 has a single unified controller and no extraneous SATA cable, see slides 15-17. Best regards, Christopher George Founder/CTO www.ddrdrive.com -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS/NFS benchmarking - is this normal?
So, my questions: ... 2) Are they ways to see if the L2ARC or ZIL are being utilised (and how effectively)? Richard Elling has an excellent dtrace script (zilstat) to exactly answer how much activity (synchronous writes) the ZIL encounters. See link: http://www.richardelling.com/Home/scripts-and-programs-1/zilstat Best regards, Christopher George Founder/CTO www.ddrdrive.com -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Looking for 3.5 SSD for ZIL
However, this *can* be overcome by frequently re-formatting the SSD (not the Solaris format, a low-level format using a vendor-supplied utility). For those looking to Secure Erase a OCZ SandForce based SSD to reclaim performance, the following OCZ Forum thread might be of interest: http://www.ocztechnologyforum.com/forum/showthread.php?75773-Secure-Erase-TRIM-and-anything-else-Sandforce OCZ uses the term DuraClass as a catch-all for algorithms controlling wear leveling, drive longevity... There is a direct correlation between Secure Erase frequency and expected SSD lifetime. Thread #1 detailing a recommended frequency of Secure Erase use: 3) Secure erase a drive every 6 months to free up previously read only blocks, secure erase every 2 days to get round Duraclass and you will kill the drive very quickly Thread #5 explaining DuraClass and relationship to TRIM: Duraclass is limiting the speed of the drive NOT TRIM. TRIM is used along with wear levelling. Thread #6 provides more details of DuraClass and TRIM: Now Duraclass monitors all writes and control's encryption and compression, this is what effects the speed of the blocks being written to..NOT the fact they have been TRIM'd or not TRIM'd. You guys have become fixated at TRIM not speeding up the drive and forget that Duraclass controls all writes incurred by the drive once a GC map has been written. Above excerpts written by a OCZ employed thread moderator (Tony). Best regards, Christopher George Founder/CTO www.ddrdrive.com -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Looking for 3.5 SSD for ZIL
You're assuming that the into an empty device performance is required by their application. My assumption was stated in the paragraph prior, i.e. vendor promised random write IOPS. Based on the inquires we receive, most *actually* expect an OCZ SSD to perform as specified which is 50K 4KB random writes for both the Vertex 2 EX and the Vertex 2 Pro. The point I was trying to make, Secure Erase is not a viable solution to write IOPS degradation, of the above listed SSDs, relative to published specifications. I think we can all agree, if Secure Erase could magically solve the problem it would already be implemented by the SSD controller. For many users, the worst-case steady-state of the device (6k IOPS the Vertex2 EX, depending on workload, as per slide 48 in your presentation) is so much faster than a rotating drive (50x faster, assuming that cache disabled on a rotating drive is roughly 100 IOPS with queueing), that it'll still provide a huge performance boost when used as a ZIL in their system. I agree 100%. I never intended to insinuate otherwise :-) Best regards, Christopher George Founder/CTO www.ddrdrive.com -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Looking for 3.5 SSD for ZIL
I'm not sure if TRIM will work with ZFS. Neither ZFS nor the ZIL code in particular support TRIM. I was concerned that with trim support the SSD life and write throughput will get affected. Your concerns about sustainable write performance (IOPS) for a Flash based SSD are valid, the resulting degradation will vary depending on the controller used. Best regards, Christopher George Founder/CTO www.ddrdrive.com -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Looking for 3.5 SSD for ZIL
I actually bought a SF-1200 based OCZ Agility 2 (60G)... Why are these not recommended? The OCZ Agility 2 or any SF-1200 based SSD is an excellent choice for the L2ARC. As on-board volatile memory does *not* need power protection because the L2ARC contents are not required to survive a host power loss (at this time). Also, checksum fallback to the pool provides data redundancy. The ZIL accelerator's requirements differ from the L2ARC, as it's very purpose is to guarantee *all* data written to the log can be replayed (on next reboot) in case of host failure. Best regards, Christopher George Founder/CTO www.ddrdrive.com -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Looking for 3.5 SSD for ZIL
got it attached to a UPS with very conservative shut-down timing. Or are there other host failures aside from power a ZIL would be vulnerable too (system hard-locks?)? Correct, a system hard-lock is another example... Best regards, Christopher George Founder/CTO www.ddrdrive.com -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Looking for 3.5 SSD for ZIL
How about comparing a non-battery backed ZIL to running a ZFS dataset with sync=disabled. Which is more risky? Most likely, the 3.5 SSD's on-board volatile (not power protected) memory would be small relative to the transaction group (txg) size and thus less risky than sync=disabled. Best regards, Christopher George Founder/CTO www.ddrdrive.com -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Looking for 3.5 SSD for ZIL
It's generally a simple thing, but requires pulling the SSD from the server, connecting it to either a Linux or Windows box, running the reformatter, then replacing the SSD. Which, is a PITA. This procedure is more commonly known as a Secure Erase. And it will return a Flash based SSD to it's original or new performance. But as demonstrated in my presentation comparing Flash to DRAM based SSDs for ZIL accelerator applicability, the most dramatic write IOPS degradation occurs in less than 10 minutes of sustained use. For reference: http://www.ddrdrive.com/zil_accelerator.pdf So for the tested devices (OCZ Vertex 2 EX / Vertex 2 Pro) to come close to matching the vendor promised random write IOPS, one would have to remove the log device from the pool and Secure Erase after every ~10 minutes of sustained ZIL use. Would having to perform a Secure Erase every hour, day, or even week really be the most cost effective use of an administrators time? Best regards, Christopher George Founder/CTO www.ddrdrive.com -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Looking for 3.5 SSD for ZIL
To the OP: First off, what do you mean by sync=disabled??? I believe he is referring to ZIL synchronicity (PSARC/2010/108). http://arc.opensolaris.org/caselog/PSARC/2010/108/20100401_neil.perrin The following presentation by Robert Milkowski does an excellent job of placing in a larger context: http://www.oug.org/files/presentations/zfszilsynchronicity.pdf Best regards, Christopher George Founder/CTO www.ddrdrive.com -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Ext. UPS-backed SATA SSD ZIL?
I haven't had a chance to test a Vertex 2 PRO against my 2 EX, and I'd be interested if anyone else has. I recently presented at the OpenStorage Summit 2010 and compared exactly the three devices you mention in your post (Vertex 2 EX, Vertex 2 Pro, and the DDRdrive X1) as ZIL Accelerators. Jump to slide 37 for the write IOPS benchmarks: http://www.ddrdrive.com/zil_accelerator.pdf and you *really* want to make sure you get the 4k alignment right Excellent point, starting on slide 66 the performance impact of partition misalignment is illustrated. Considering the results, longevity might be an even greater concern than decreased IOPS performance as ZIL acceleration is a worst case scenario for a Flash based SSD. The DDRdrive is still the way to go for the ultimate ZIL accelleration, but it's pricey as hell. In addition to product cost, I believe IOPS/$ is a relevant point of comparison. Google products gives the price range for the OCZ 50GB SSDs: Vertex 2 EX (OCZSSD2-2VTXEX50G: $870 - $1,011 USD) Vertex 2 Pro (OCZSSD2-2VTXP50G: $399 - $525 USD) 4KB Sustained and Aligned Mixed Write IOPS results (See pdf above): Vertex 2 EX (6325 IOPS) Vertex 2 Pro (3252 IOPS) DDRdrive X1 (38701 IOPS) Using the lowest online price for both the Vertex 2 EX and Vertex 2 Pro, and the full list price (SRP) of the DDRdrive X1. IOPS/Dollar($): Vertex 2 EX (6325 IOPS / $870) = 7.27 Vertex 2 Pro (3252 IOPS / $399) = 8.15 DDRdrive X1 (38701 IOPS / $1,995) = 19.40 Best regards, Christopher George Founder/CTO www.ddrdrive.com -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Ext. UPS-backed SATA SSD ZIL?
Why would you disable TRIM on an SSD benchmark? Because ZFS does *not* support TRIM, so the benchmarks are configured to replicate actual ZIL Accelerator workloads. If you're doing sustained high-IOPS workloads like that, the back-end is going to fall over and die long before the hour time-limit. The reason the graphs are done in a time line fashion is so you look at any point in the 1 hour series to see how each device performs. Best regards, Christopher George Founder/CTO www.ddrdrive.com -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Ext. UPS-backed SATA SSD ZIL?
TRIM was putback in July... You're telling me it didn't make it into S11 Express? Without top level ZFS TRIM support, SATA Framework (sata.c) support has no bearing on this discussion. Best regards, Christopher George Founder/CTO www.ddrdrive.com -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Ext. UPS-backed SATA SSD ZIL?
Furthermore, I don't think 1 hour sustained is a very accurate benchmark. Most workloads are bursty in nature. The IOPS degradation is additive, the length of the first and second one hour sustained period is completely arbitrary. The take away from slides 1 and 2 is drive inactivity has no effect on the eventual outcome. So with either a bursty or sustained workload the end result is always the same, dramatic write IOPS degradation after unpackaging or secure erase of the tested Flash based SSDs. Best regards, Christopher George Founder/CTO www.ddrdrive.com -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Ext. UPS-backed SATA SSD ZIL?
I'm doing compiles of the JDK, with a single backed ZFS system handing the files for 20-30 clients, each trying to compile a 15 million-line JDK at the same time. Very cool application! Can you share any metrics, such as the aggregate size of source files compiled and the size of the resultant binaries? Thanks, Christopher George Founder/CTO www.ddrdrive.com -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] OCZ RevoDrive ZFS support
I'm curious if there is a support for OCZ RevoDrive SSD or any other SSD hooked directly on PCIe in Solaris. The RevoDrive should not require a custom device driver as it is based on the Silicon Image 3124 PCI-X RAID controller connected to a Pericom PCI-X to PCIe bridge chip (PI7C9X130). The required driver would be the si3124(7D), I noticed the man page states NCQ is not supported. I found the following link detailing the status: http://opensolaris.org/jive/thread.jspa?messageID=466436 It might make an interesting L2ARC device, as it is definitely low cost. It's based on multiple SandForce 1200 controllers. Important to note because the on-board volatile caches are not power protected, so not a good fit for the ZIL Accelerator. But perfectly acceptable for the L2ARC as drive contents are not required to survive a power failure (at this time). Our PCIe based SSD, the DDRdrive X1, does require a dedicated device driver and we exclusively target ZIL acceleration. We support OpenSolaris (2009-06 through b134), OpenIndiana, and NexentaStor 3.0. I am excited to announce we just completed validation and are now also supporting Solaris 11 Express! Best regards, Christopher George Founder/CTO www.ddrdrive.com -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Any opinoins on these SSD's?
Any opinions? stories? other models I missed? I was a speaker at the recent OpenStorage Summit, my presentation ZIL Accelerator: DRAM or Flash? might be of interest: http://www.ddrdrive.com/zil_accelerator.pdf Best regards, Christopher George Founder/CTO www.ddrdrive.com -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Bursty writes - why?
Maybe this random-write issue with Sandforce would not be a problem? It is most definitely a problem, as one needs to question the conventional assertion of a sequential write pattern? I presented some findings recently at the Nexenta Training Seminar in Rotterdam. Here is a link to an excerpt (full presentation available to those interested, email cgeorge at ddrdrive dot com): http://www.ddrdrive.com/zil_iopattern_excerpt.pdf In summary, a sequential write pattern is found for a pool with only a single file system. But as additional file systems are added the resultant (or aggregate) write pattern trends to random. Over 50% random with a pool containing just 5 filesystems. This makes intuitive sense knowing each filesystem has it's own ZIL and they all share the dedicated log (ZIL Accelerator). Best regards, Christopher George Founder/CTO www.ddrdrive.com -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] 4k block alignment question (X-25E)
What is a NVRAM based SSD? It is simply an SSD (Solid State Drive) which does not use Flash, but does use power protected (non-volatile) DRAM, as the primary storage media. http://en.wikipedia.org/wiki/Solid-state_drive I consider the DDRdrive X1 to be a NVRAM based SSD even though we delineate the storage media used depending on host power condition. The X1 exclusively uses DRAM for all IO processing (host is on) and then Flash for permanent non-volatility (host is off). Thanks, Christopher George Founder/CTO www.ddrdrive.com -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] 4k block alignment question (X-25E)
I was wondering if anyone had a benchmarking showing this alignment mattered on the latest SSDs. My guess is no, but I have no data. I don't believe there can be any doubt whether a Flash based SSD (tier1 or not) is negatively affected by partition misalignment. It is intrinsic to the required asymmetric erase/program dual operation and the resultant RMW penalty to perform a write if unaligned. This is detailed in the following vendor benchmarking guidelines (SF-1500 controller): http://www.smartm.com/files/salesLiterature/storage/AN001_Benchmark_XceedIOPSSATA_Apr2010_.pdf Highlight from link - Proper partition alignment is one of the most critical attributes that can greatly boost the I/O performance of an SSD due to reduced read modify‐write operations. It should be noted, the above highlight only applies to Flash based SSD as an NVRAM based SSD does *not* suffer the same fate, as its performance is not bound by or vary with partition (mis)alignment. Best regards, Christopher George Founder/CTO www.ddrdrive.com -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Best usage of SSD-disk in ZFS system
Best performance this way, true. But it's not necessarily a solution for everyone. I agree. Our product the DDRdrive X1, an NVRAM based SSD, is definitely not targeted towards the home end-user. By nature of our component choices, fabrication tech, and ultimately price point we are a ZIL accelerator well matched to the 24/7 demands of enterprise use. Thanks, Christopher George Founder/CTO www.ddrdrive.com -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How does zil work
Here is another very recent blog post from ConstantThinking: http://constantin.glez.de/blog/2010/07/solaris-zfs-synchronous-writes-and-zil-explained Very well done, a highly recommended read. Christopher George Founder/CTO www.ddrdrive.com -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] High-Performance ZFS (2000MB/s+)
I mean, could I stripe across multiple devices to be able to handle higher throughput? Absolutely. Stripping four DDRdrive X1s (16GB dedicated log) is extremely simple. Each X1 has it's own dedicated IOPS controller, critical for approaching linear synchronous write scalability. The same principles and benefits of multi-core processing apply here with multiple controllers. The performance potential of NVRAM based SSDs dictates moving away from a single/separate HBA based controller. Best regards, Christopher George Founder/CTO www.ddrdrive.com -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SSDs adequate ZIL devices?
So why buy SSD for ZIL at all? For the record, not all SSDs ignore cache flushes. There are at least two SSDs sold today that guarantee synchronous write semantics; the Sun/Oracle LogZilla and the DDRdrive X1. Also, I believe it is more accurate to describe the root cause as not power protecting on-board volatile caches. As the X25-E does implement the ATA FLUSH CACHE command, but does not have the required power protection to avoid transaction (data) loss. Best regards, Christopher George Founder/CTO www.ddrdrive.com -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ssd pool + ssd cache ?
No Slogs as I haven't seen a compliant SSD drive yet. As the architect of the DDRdrive X1, I can state categorically the X1 correctly implements the SCSI Synchronize Cache (flush cache) command. Christopher George Founder/CTO www.ddrdrive.com -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ssd pool + ssd cache ?
Thanks Garrett! 2) it is dependent on an external power source (a little wall wart provides low voltage power to the card... I don't recall the voltage off hand) 9V DC. 3) the contents of the card's DDR ram are never flushed to non-volatile storage automatically, but require an explicit action from the administrator to save or restore the contents of the DDR to NAND flash. (This operation takes 60 seconds, during which the card is not responsive to other commands.) For the internally developed and RTM OpenSolaris/NexentaStor 3.0 device driver this is not the case, as automatic backup/restore is the default configuration. On host power down/failure the X1 automatically performs a backup, i.e. the DRAM is copied to the on-board NAND (Flash). On the next boot, the NAND is automatically restored to DRAM. This process is seamless and doesn't require any user intervention. *** The hardware support required for automatic backup/restore was not yet available when Garrett wrote the blk2scsa based driver. 4) the cost of the device is significantly higher (ISTR $1800, but it may be less than that) than a typical SSD, with much smaller capacity (4GB) than typical SSD. But it offers much lower latencies and higher performance than any other SSD I've encountered. The last I checked, the STEC SSD resold by Sun/Oracle, which also correctly implements cache flush, was $6,000. So for SSDs that fully comply with the POSIX requirements for synchronous write transactions and do not lose transactions on a host power failure, we are competitively priced at $1,995 SRP. Christopher George Founder/CTO www.ddrdrive.com -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SSD best practices
To clarify, the DDRdrive X1 is not an option for OpenSolaris today, irrespective of specific features, because the driver is not yet available. When our OpenSolaris device driver is released, later this quarter, the X1 will have updated firmware to automatically provide backup/restore based on an external power source. We hope the X1 will be the first in a family of products, where future iterations will also offer an internal power source option. Feedback from this list also played a decisive role in our forthcoming strategy to focus exclusively on serving the ZFS dedicated log market. Thanks, Christopher George Founder/CTO www.ddrdrive.com -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SSD best practices
I think the DDR drive has a battery and can dump to a cf card. The DDRdrive X1's automatic backup/restore feature utilizes on-board SLC NAND (high quality Flash) and is completely self- contained. Neither the backup nor restore feature involves data transfer over the PCIe bus or to/from removable media. Thanks, Christopher George Founder/CTO www.ddrdrive.com -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SSD best practices
IMHO, whether a dedicated log device needs redundancy (mirrored), should be determined by the dynamics of each end-user environment (zpool version, goals/priorities, and budget). If mirroring is deemed important, a key benefit of the DDRdrive X1, is the HBA / storage device integration. For example, to approach the redundancy of a mirrored DDRdrive X1 pair, a SATA Flash based SSD solution would require each SSD to have a dedicated HBA controller. As sharing an HBA between the two mirrored SSDs would introduce a single point of failure not existing in the X1 configuration. Even with dedicated HBAs, removing the need for SATA cables while halving both the controller count and data path travel will notably increase reliability. It should be mentioned, one plus for a mirrored Flash SSD with dedicated HBAs (no cache or write through) is the lack of required power protection. Thanks, Christopher George Founder/CTO www.ddrdrive.com -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SSD best practices
There is no definitive answer (yes or no) on whether to mirror a dedicated log device, as reliability is one of many variables. This leads me to the frequently given but never satisfying it depends. In a time when too many good questions go unanswered, let me take advantage of our less rigid rules of engagement and share some facts about the DDRdrive X1 which are uncommonly shared: - 12 Layer PCB (layman translation - more layers, better SI, higher cost) - Nelco N4000-13 EP Laminate (extremely high quality, a price to match) - Solid Via Construction (hold a X1 in front of a bright light - no holes :-) - Best of Breed components, all 520 of them - Assembled and validated in Northern CA, USA - 1.5 weeks of test/burn-in of every X1. (extensive DRAM validation) In summary, the DDRdrive X1 is designed, built and tested with immense pride and an overwhelming attention to detail. Thanks, Christopher George Founder/CTO www.ddrdrive.com -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Sun Flash Accelerator F20 numbers
Well, I did look at it but at that time there was no Solaris support yet. Right now it seems there is only a beta driver? Correct, we just completed functional validation of the OpenSolaris driver. Our focus has now turned to performance tuning and benchmarking. We expect to formally introduce the DDRdrive X1 to the ZFS community later this quarter. It is our goal to focus exclusively on the dedicated ZIL device market going forward. I kind of remember that if you'd want reliable fallback to nvram, you'd need an UPS feeding the card. Currently, a dedicated external UPS is required for correct operation. Based on community feedback, we will be offering automatic backup/restore prior to release. This guarantees the UPS will only be required for 60 secs to successfully backup the drive contents on a host power or hardware failure. Dutifully on the next reboot the restore will occur prior to the OS loading for seamless non-volatile operation. Also,we have heard loud and clear the requests for a internal power option. It is our intention the X1 will be the first in a family of products all dedicated to ZIL acceleration for not only OpenSolaris but also Solaris 10 and FreeBSD. Also, we'd kind of like to have a SnOracle supported option. Although a much smaller company, we believe our singular focus and absolute passion for ZFS and the potential of Hybrid Storage Pools will serve our customers well. We are actively designing our soon to be available support plans. Your voice will be heard, please email directly at cgeorge at ddrdrive dot com for requests, comments and/or questions. Thanks, Christopher George Founder/CTO www.ddrdrive.com -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] New ZFS Intent Log (ZIL) device available - Beta program now open!
From the web page it looks like this is a card that goes into the computer system. That's not very useful for enterprise applications, as they are going to want to use an external array that can be used by a redundant pair of servers. The DDRdrive X1 does utilize a half-length/full-height/two-slot PCIe plug-in card form factor. So for systems such as the Sun Storage 7310/7410, we are not a solution. Sun does offer a Write Flash Accelerator (Logzilla) to satisfy both single and clustered controller configurations. Our intention is to provide enterprise customers (non-clustered) an additional option. Thanks, Christopher George Founder/CTO www.ddrdrive.com -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] New ZFS Intent Log (ZIL) device available - Beta program now open!
I'm not sure about others on the list, but I have a dislike of AC power bricks in my racks. I definitely empathize with your position concerning AC power bricks, but until the perfect battery is created, and we are far from it, it comes down to tradeoffs. I personally believe the ignition risk, thermal wear-out, and the inflexible proprietary nature of Li-Ion solutions simply outweigh the benefits of internal or all inclusive mounting for enterprise bound NVRAM. Is the state of the power input exposed to software in some way? In other terms, can I have a nagios check running on my server that triggers an alert if the power cable accidentally gets pulled out? Absolutely, the X1 monitors the external supply and can detect not only a disconnect but any loss of power. In all cases, the card throws an interrupt so that the device driver (and ultimately user space) can be immediately notified. The X1 does not rely on external power until the host power drops below a certain threshold, so attaching/detaching the external power cable has no effect on data integrity as long as the host is powered on. OK, which means that the UPS must be separate to the UPS powering the server then. Correct, a dedicated (in this case redundant) UPS is expected. Any plans on a pci-e multi-lane version then? Not at this time. In addition to the reduced power and thermal output, the PCIe x1 connector has the added benefit of not competing with other HBA's which do require a x4 or x8 PCIe connection. Very appreciative of the feedback! Christopher George Founder/CTO www.ddrdrive.com -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] New ZFS Intent Log (ZIL) device available - Beta program now open!
Is there any data out there that have tracked these sort of ignition incidents? I have to admit I'd never heard of this. We have quite a few BBU backed RAID controllers in our servers and I've never had anything remotely like this occur. I know anecdotal evidence is meaningless, but this definitely surprised me a little. I agree, it would be very informative if RAID HBA vendors would publish failure statistics of their Li-Ion based BBU products. My gut tells me the risk of this is pretty low and most are going to prefer the convenience of an onboard BBU to installing UPS'es in all their racks (as good a practice as that may be). Again I agree, I am not recommending, nor did I mean to allude, that to be the proper and/or preferred solution for RAID controllers. To my knowledge, the mAh requirements of a DDRdrive X1 class product cannot be supported by any of the BBUs currently found on RAID controllers. It would require either a substantial increase in energy density or a decrease in packaging volume both of which incur additional risks. Interesting product though! Thanks, Christopher George Founder/CTO www.ddrdrive.com -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] New ZFS Intent Log (ZIL) device available - Beta program now open!
That's kind of an overstatement. NVRAM backed by on-board LI-Ion batteries has been used in storage industry for years; Respectfully, I stand by my three points of Li-Ion batteries as they relate to enterprise class NVRAM: ignition risk, thermal wear-out, and proprietary design. As a prior post stated, there is a dearth of published failure statistics of Li-Ion based BBUs. I can easily point out a company that has shipped tens of thousands of such boards over last 10 years. No argument here, I would venture the risks for consumer based Li-Ion based products did not become apparent or commonly accepted until the user base grew several orders of magnitude greater than tens of thousands. For the record, I agree there is a marked convenience with an integrated high energy Li-Ion battery solution - but at what cost? We chose an external solution because it is a proven and industry standard method of enterprise class data backup. Thanks, Christopher George Founder/CTO www.ddrdrive.com -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] New ZFS Intent Log (ZIL) device available - Beta program now open!
Why not enlighten EMC/NTAP on this then? On the basic chemistry and possible failure characteristics of Li-Ion batteries? I will agree, if I had system level control as in either example, one could definitely help mitigate said risks compared to selling a card based product where I have very little control over the thermal envelopes I am subjected. Could you please elaborate on the last statement, provided you meant anything beyond UPS is a power-backup standard? Although, I do think the discourse is healthy and relevant. At this point, I am comfortable to agree to disagree. I respect your point of view, and do agree strongly that Li-Ion batteries play a critical and highly valued role in many industries. Thanks, Christopher George Founder/CTO www.ddrdrive.com -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] New ZFS Intent Log (ZIL) device available - Beta program now open!
I see nothing in the design that precludes a customer from using a Li-Ion battery, if they so desire. Perhaps the collective has forgotten that DC power is one of the simplest and most widespread interfaces around? :-) Richard, Very good point! We have already had a request for the DC jack to be unpopulated so that an internal power source could be utilized. We will make this modification available to any customer who asks. Thanks, Christopher George Founder/CTO www.ddrdrive.com -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] New ZFS Intent Log (ZIL) device available - Beta program now open!
Personally I'd say it's a must. Most DC's I operate in wouldn't tolerate having a card separately wired from the chassis power. May I ask the list, if this is a hard requirement for anyone else? Please email me directly cgeorge at ddrdrive dot com. Thank you, Christopher George Founder/CTO www.ddrdrive.com -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] New ZFS Intent Log (ZIL) device available - Beta program now open!
The DDRdrive X1 OpenSolaris device driver is now complete, please join us in our first-ever ZFS Intent Log (ZIL) beta test program. A select number of X1s are available for loan, preferred candidates would have a validation background and/or a true passion for torturing new hardware/driver :-) We are singularly focused on the ZIL device market, so a test environment bound by synchronous writes is required. The beta program will provide extensive technical support and a unique opportunity to have direct interaction with the product designers. Would you like to take part in the advancement of Open Storage and explore the far-reaching potential of ZFS based Hybrid Storage Pools? If so, please send an inquiry to zfs at ddrdrive dot com. The drive for speed, Christopher George Founder/CTO www.ddrdrive.com *** Special thanks goes out to SUN employees Garrett D'Amore and James McPherson for their exemplary help and support. Well done! -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] New ZFS Intent Log (ZIL) device available - Beta program now open!
Excellent questions! I see the PCI card has an external power connector - can you explain how/why that's required, as opposed to using an on card battery or similar. DDRdrive X1 ZIL functionality is best served with an external attached UPS, this allows the X1 to perform as a non-volatile storage device without specific user configuration or unique operation. An often overlooked aspect of batteries (irrespective of technology or internal/external) is their limited lifetime and varying degrees of maintenance and oversight required. For example, a lithium (Li-Ion) battery supply, as used by older NVRAM products and not the X1, does have the minimum required energy density for an internal solution. But has a fatal flaw for enterprise applications - an ignition mode failure possibility. Google lithium battery fire. Such an instance, even if rare, would be catastrophic not only to the on-card data but the host server and so on... Supercapacitors are another alternative which thankfully do not share the ignition mode failure mechanism of Li-Ion, but are hampered mainly by cost with some longevity concerns which can be addressed. In the end, we selected data integrity, cost, and serviceability as our top three priorities. This led us to the industry standard external lead-acid battery as sold by APC. Key benefits of the DDRdrive X1 power solution: 1) Data Integrity - Supports multiple back-to-back power failures, a single DDRdrive X1 uses less than 5W when the host is powered down, even a small UPS is over-provisioned and unlike an internal solution will not normally require a lengthy recharge time prior to the next power incident. Optionally a backup to NAND can be performed to remove the UPS duration as a factor. 2) Cost Effective / Flexible - The Smart-UPS SC 450VA (280 Watts) is an excellent choice for most installations and retails for approximately $150.00. Flexibility is in regard to UPS selection, as it can be right-sized (duration) for each individual application if needed. 3) Reliability / Maintenance - UPS front panel LED status for battery replacement and audible alarms when battery is low or non-operational. Industry standard battery form factor backed by APC the industry leading manufacture of enterprise-class backup solutions. What happens if the *host* power to the card fails? Nothing, the DDRdrive X1's data integrity is guaranteed by the attached UPS. The 155mb rate for sustained writes is low for DDR ram? The DRAM's value add is it's extremely low latency (even compared to NAND) and other intrinsic properties such as longevity and reliability. The read/write sequential bandwidth is completely bound by the PCI Express interface. Is this because the backup to NAND is a constant thing, rather than only at power fail? No, the backup to NAND is not continual. All Host IO is directed to DRAM for maximum performance while the NAND only provides an optional (user configured) backup/restore feature. Christopher George Founder/CTO www.ddrdrive.com -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss