Re: [zfs-discuss] what have you been buying for slog and l2arc?

2012-08-07 Thread Anonymous Remailer (austria)

 It depends on the model. Consumer models are less likely to
 immediately flush. My understanding that this is done in part to do
 some write coalescing and reduce the number of P/E cycles. Enterprise
 models should either flush, or contain a super capacitor that provides
 enough power for the drive to complete writing any date in its buffer.

My Home Fusion SSD runs on banana peels and eggshells and uses a Flux
Capacitor. I've never had a failure.


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] what have you been buying for slog and l2arc?

2012-08-07 Thread Sašo Kiselkov
On 08/07/2012 02:18 AM, Christopher George wrote:
 I mean this as constructive criticism, not as angry bickering. I totally
 respect you guys doing your own thing.
 
 Thanks, I'll try my best to address your comments...

Thanks for your kind reply, though there are some points I'd like to
address, if that's okay.

 *) Increased capacity for high-volume applications.
 
 We do have a select number of customers striping two
 X1s for a total capacity of 8GB, but for a majority of our customers 4GB
 is perfect.  Increasing capacity
 obviously increases the cost, so we wanted the baseline
 capacity to reflect a solution to most but not every need.

Certainly, for most uses this isn't an issue. I just threw that in
there, considering how cheap DRAM and flash is nowadays and how easy it
is to create disk pools which push 2GB/s in write thoughput, I was
hoping you guys would be keeping pace with that (getting 4GB of sync
writes in the txg commit window can be tough, but not unthinkable). In
any case, simply dismissing it by saying that simply get two, you are
effectively doubling my slog costs which, considering the recommended
practice is to get a slog mirror, would mean that I have to get 4 X1's.
That's $8k in list prices and 8 full-height PCI-e slots wasted (seeing
as how an X1 is wider than the standard PCI-e card). Not many systems
can do that (that's why I said solder the DRAM and go low-profile).

 *) Remove the requirement to have an external UPS (couple of
supercaps? microbattery?)
 
 Done!  We will be formally introducing an optional DDRdrive
 SuperCap PowerPack at the upcoming OpenStorage Summit.

Great! Though I suppose that will inflate the price even further (seeing
as you used the word optional).

 *) Use cheaper MLC flash to lower cost - it's only written to in case
of a power outage, anyway so lower write cycles aren't an issue and
modern MLC is almost as fast as SLC at sequential IO (within 10%
usually).
 
 We will be staying with SLC not only for performance but
 longevity/reliability.
 Check out the specifications (ie erase/program cycles and required ECC)
 for a modern 20 nm MLC chip and then let me know if this is where you
 *really* want to cut costs :)

MLC is so much cheaper that you can simply slap on twice as much and use
the rest for ECC, mirroring or simply overprovisioning sectors. The
common practice to extending the lifecycle of MLC is by short-stroking
it, i.e. using only a fraction of the capacity. E.g. a 40GB MLC unit
with 5-10k cycles per cell can be turned into a 4GB unit (with the
controller providing wear leveling) with effectively 50-100k cycles
(that's SLC land) for about a hundred bucks. Also, since I'm mirroring
it already with ZFS checksums to provide integrity checking, your
argument simply doesn't hold up.

Oh and don't count on Illumos missing support for SCSI Unmap or SATA
TRIM forever. Work is underway to rectify this situation.

 *) PCI Express 3.0 interface (perhaps even x4)
 
 Our product is FPGA based and the PCIe capability is the biggest factor
 in determining component cost.  When we introduced the X1, the FPGA cost
 *alone* to support just PCIe Gen2 x8 was greater than the current street
 price of the DDRdrive X1.

I always had a bit of an issue with non-hotswappable storage systems.
What if an X1 slog dies? I need to power the machine down, open it up,
take out the slog, put it another one and power it back up. Since ZFS
has slog removal support, there's no reason to go for non-hotpluggable
slogs anyway. What about 6G SAS? Dual ported you could push around
12Gbit/s of bandwidth to/from the device, way more than the current
250MB/s, and get hotplug support in there too.

 *) At least updated benchmarks your site to compare against modern
flash-based competition (not the Intel X25-E, which is seriously
stone age by now...)
 
 I completely agree we need to refresh the website, not even the photos
 are representative of our shipping product (we now offer VLP DIMMs).
 We are engineers first and foremost, but an updated website is in the
 works.
 
 In the mean time, we have benchmarked against both the Intel 320/710
 in my OpenStorage Summit 2011 presentation which can be found at:
 
 http://www.ddrdrive.com/zil_rw_revelation.pdf

I always had a bit of an issue with your benchmarks. First off, you're
only ever doing synthetics. They are very nice, but don't provide much
in terms of real-world perspective. Try and compare on price too. Take
something like a Dell R720, stick in the equivalent (in terms of cost!)
of DRAM SSDs and Flash SSDs (i.e. for X1 you're looking at like 4 Intel
710s) and run some real workloads (database benchmarks, virtualization
benchmarks, etc.). Experiment beats theory, every time.

 *) Lower price, lower price, lower price.
I can get 3-4 200GB OCZ Talos-Rs for $2k FFS. That means I could
equip my machine with one to two mirrored slogs and nearly 800GB
worth of L2ARC for the price of a single X1.
 
 I 

Re: [zfs-discuss] what have you been buying for slog and l2arc?

2012-08-07 Thread Michael Hase

On Mon, 6 Aug 2012, Christopher George wrote:


I mean this as constructive criticism, not as angry bickering. I totally
respect you guys doing your own thing.


Thanks, I'll try my best to address your comments...


*) At least updated benchmarks your site to compare against modern
   flash-based competition (not the Intel X25-E, which is seriously
   stone age by now...)


I completely agree we need to refresh the website, not even the photos are 
representative of our shipping product (we now offer VLP DIMMs).

We are engineers first and foremost, but an updated website is in the works.

In the mean time, we have benchmarked against both the Intel 320/710
in my OpenStorage Summit 2011 presentation which can be found at:

http://www.ddrdrive.com/zil_rw_revelation.pdf


Very impressive iops numbers. Although I have some thoughts on the 
benchmarking method itself. Imho the comparison shouldn't be raw iops 
numbers on the ddrdrive itself as tested with iometer (it's only 4gb), but 
real world numbers on a real world pool consisting of spinning disks with 
ddrdrive acting as zil accelerator.


I just introduced an intel 320 120gb as zil accelerator for a simple zpool 
with two sas disks in raid0 configuration, and it's not as bad as in your 
presentation. It shows about 50% of the possible nfs ops with the ssd as 
zil versus no zil (sync=disabled on oi151), and about 6x-8x the 
performance compared to the pool without any accelerator and 
sync=standard. The case with no zil is the upper limit one can achieve on 
a given pool, in my case creation of about 750 small files/sec via nfs. 
With the ssd it's 380 files/sec (nfs stack is a limiting factor, too). Or 
about 2400 8k write iops with the ssd vs. 11900 iops with zil disabled, 
and 250 iops without accelerator (gnu dd with oflag=sync). Not bad at all. 
This could be just good enough for small businesses and moderate sized 
pools.


Michael

--
Michael Hase
edition-software GmbH
http://edition-software.de
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] what have you been buying for slog and l2arc?

2012-08-07 Thread Bob Friesenhahn

On Tue, 7 Aug 2012, Sašo Kiselkov wrote:


MLC is so much cheaper that you can simply slap on twice as much and use
the rest for ECC, mirroring or simply overprovisioning sectors. The
common practice to extending the lifecycle of MLC is by short-stroking
it, i.e. using only a fraction of the capacity. E.g. a 40GB MLC unit
with 5-10k cycles per cell can be turned into a 4GB unit (with the
controller providing wear leveling) with effectively 50-100k cycles
(that's SLC land) for about a hundred bucks. Also, since I'm mirroring
it already with ZFS checksums to provide integrity checking, your
argument simply doesn't hold up.


Remember he also said that the current product is based principally on 
an FPGA.  This FPGA must be interfacing directly with the Flash device 
so it would need to be substantially redesigned to deal with MLC Flash 
(probably at least an order of magnitude more complex), or else a 
microcontroller would need to be added to the design, and firmware 
would handle the substantial complexities.  If the Flash device writes 
slower, then the power has to stay up longer.  If the Flash device 
reads slower, then it takes longer for the drive to come back on 
line.


Quite a lot of product would need to be sold in order to pay for both 
re-engineering and the cost of running a business.


Regardless, continual product re-development is necessary or else it 
will surely die.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] what have you been buying for slog and l2arc?

2012-08-07 Thread Sašo Kiselkov
On 08/07/2012 04:08 PM, Bob Friesenhahn wrote:
 On Tue, 7 Aug 2012, Sašo Kiselkov wrote:

 MLC is so much cheaper that you can simply slap on twice as much and use
 the rest for ECC, mirroring or simply overprovisioning sectors. The
 common practice to extending the lifecycle of MLC is by short-stroking
 it, i.e. using only a fraction of the capacity. E.g. a 40GB MLC unit
 with 5-10k cycles per cell can be turned into a 4GB unit (with the
 controller providing wear leveling) with effectively 50-100k cycles
 (that's SLC land) for about a hundred bucks. Also, since I'm mirroring
 it already with ZFS checksums to provide integrity checking, your
 argument simply doesn't hold up.
 
 Remember he also said that the current product is based principally on
 an FPGA.  This FPGA must be interfacing directly with the Flash device
 so it would need to be substantially redesigned to deal with MLC Flash
 (probably at least an order of magnitude more complex), or else a
 microcontroller would need to be added to the design, and firmware would
 handle the substantial complexities.  If the Flash device writes slower,
 then the power has to stay up longer.  If the Flash device reads slower,
 then it takes longer for the drive to come back on line.

Yeah, I know, but then, you can interface with an existing
industry-standard flash controller, no need to design your own (reinvent
the wheel). The choice of FPGA is good for some things, but flexibility
in exchanging components certainly isn't one of them.

If I were designing something akin to the X1, I'd go with a generic
embedded CPU design (e.g. a PowerPC core) interfacing with standard
flash components and running the primary front-end from the chip's
on-board DRAM. I mean, just to give you some perspective, for $2k I
could build a full computer with 8GB of mirrored ECC DRAM which
interfaces via an off-the-shelf 6G SAS HBA (with two 4x wide 6G SAS
ports) or perhaps even an InfiniBand adapter with RDMA with the host
machine, includes a small SSD in it's SATA bay and a tiny UPS battery to
run the whole thing for a few minutes while we write DRAM contents to
flash in case of a power outage (the current X1 doesn't even include
this in its base design). And that's something I could do with
off-the-shelf components for less than $2k (probably a whole lot less)
with a production volume of _1_.

 Quite a lot of product would need to be sold in order to pay for both
 re-engineering and the cost of running a business.

Sure, that's why I said it's David vs. Goliath. However, let's be honest
here, the X1 isn't a terribly complex product. It's quite literally a
tiny computer with some DRAM and a feature to dump DRAM contents to
Flash (and read it back later) in case power fails. That's it.

Cheers,
--
Saso
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] what have you been buying for slog and l2arc?

2012-08-07 Thread Christopher George
Very impressive iops numbers. Although I have some thoughts on the 
benchmarking method itself. Imho the comparison shouldn't be raw iops 
numbers on the ddrdrive itself as tested with iometer (it's only 4gb), 


The purpose of the benchmarks presented is to isolate the inherent capability 
of just the SSD in a simple/synthetic/sustained Iometer 4KB random write test.  
This test successfully illuminates a critical difference between a Flash only and a 
DRAM/SLC based SSD.  Flash only SSD vendors are *less* than forthright 
in their marketing when specifying their 4KB random write capability.  I am

surprised vendors are not called out for marketing FOB (fresh out of the box)
results (that even with TRIM support) are not sustainable.  Intel was a notable
exception until they also introduced SSDs based on SandForce controllers.

In the section prior to the benchmarks, titled ZIL Accelerator access pattern
random and/or sequential I show an example workload and how it translates 
to an actual log device's access pattern.  It clearly shows a wide (21-71%) 
spectrum of  random write accesses.  So before even presenting any Iometer 
results, I don't believe I indicate or even imply that real world workloads will 
somehow be 100% 4KB random write based.  For the record, I agree with you 
as they are obviously not!


real world numbers on a real world pool consisting of spinning disks with 
ddrdrive acting as zil accelerator.


Benchmarking is frustrating for us also, as what is a real world pool?
And if we picked one to benchmark, how relevant would it be to others?

1) number of vdevs (we see anywhere from one to massive)
2) vdev configuration (only mirrored pairs to 12 disk raidz2)
3) HDD type (low rpm green HDDs to SSD only pools)
4) host memory size (we see not enough to 192GB+)
5) number of host CPUs (you get the picture)
6) network connection (1GB to multiple 10GB)
7) number of network ports
8) direct connect to client or through a switch(s)

Is the ZFS pool accessed using NFS or iSCSI?
What is the client OS?
What is the client configuration?
What is the workload composition (read/async write/sync write)?
What is the workload access pattern (sequential/random)?
...

This could be just good enough for small businesses and moderate sized 
pools.


No doubt, we are also very clear on who we target (enterprise customers).

The beauty of ZFS is the flexibility of it's implementation.  By supporting 
multiple log device types and configurations it ultimately enables a broad 
range of performance capabilities!


Best regards,
Chris

--
Christopher George
cgeorge at ddrdrive.com
http://www.ddrdrive.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss