On Mon, Jun 10, 2013 at 5:44 PM, Lucas Stanley <[email protected]> wrote:
> If I understand HBase's architecture correctly, it is only the WAL that > needs to be placed on a SSD to make writes perform better? > I'm skeptical when looking at the whole picture. Depending which version of HDFS you are using, and its configuration, writes to the WAL can be acked by three datanodes (including one off rack, presumably under separate power) after being received into memory without waiting for fsync. This operates in the network and memory latency regimes already, not that of spinning media, so the benefit SSDs could provide here is maybe less than one might think. For many use cases this persistence strategy is good enough, but for the paranoid, to as much as possible avoid *any* data loss upon total datacenter power failure, then it's necessary to configure the datanodes not to ack until after fsync completes on the blocks in progress. In that case I presume using SSDs will reduce the average latencies involved, but SSDs can also have periods of terrible write latency caused by garbage collection at the FTL layer and other reasons, with worst cases I have heard upwards of 40 seconds. That's significantly worse than worst cases for spinning media. Also, SSDs are susceptible to data corruption upon sudden power loss. I've heard of solid state devices surprisingly totally and partially (as in a third of the device) bricked by sudden power loss. If you think of FTLs as embedded custom filesystems of varying maturity, maybe this shouldn't be so surprising. So even if fsync completes on the SSD before power failure, you may still lose everything on it. That's also a worst case worse than typical for spinning media. (How frequent? Don't know. But I'm a pessimist by training.) Taking a step back, you can turn off writes to the WAL selectively to make an informed trade off between performance and data loss risk on a per application / per write basis, and administratively flush memstores for persistence dynamically independent of the WAL. There are knobs available for increasing write performance, depending on your tolerance for risk, in the absence today of support for tiered storage in HBase/HDFS. On the other hand, random read workloads should benefit from having the backing HFiles of hot read-mostly data placed into SSD storage. SSDs are best for read heavy workloads in my opinion, there's long periods of time without writes to achieve stable state, and they will live longer the less writes they are subjected to. Random reads of working sets that exceed the capacity of the blockcache are clearly impacted by the physical limits of rotational media. Moving HBase storage from disk to SSDs able to sustain orders of magnitude more read IOPS should produce a benefit, with the greater the difference, for a given workload, between the IOPS disks can drive versus SSDs, the more the potential benefit. We are doing R&D in this area over at Intel and plan to publish experimental results in a few months. -- Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)
