On Wed, Nov 2, 2016 at 2:17 PM, Jeff Dasch <[email protected]> wrote:

> Hi,
>
> I apologize if this has been previously answered, i haven't figured out
> how to search the list effectively yet.
>
> Does putting the WAL on a separate SSD improve write latency, write
> throughput, or both?  If I'm only interested in throughput, would a SAS
> drive be sufficient?
>

Typically it improves both: latency is more stable, and peak throughput can
be better if you are WAL bound. However, if your dataset is quite large and
you are random-writing, you probably aren't WAL-bound.

You might try running 'iostat -dxm 1' during your workload to get a sense
of whether your current WAL disk is a bottleneck or not: if you frequently
see high 'await' times and 'util' percentages on the WAL drive, then you're
likely to get some benefit by moving it to an SSD.


> Are there any benchmarks showing shared vs dedicated drive vs dedicated
> SSD?
>
>
I've run a couple with YCSB comparing shared vs dedicated HDD, but haven't
done a rigorous comparison of SSD WAL vs HDD. Would be interested if you
can report back any findings.


> Also, I assume a small 100GB drive would be an adequate size for the WAL?
>
>
Yea, I think that should be sufficient for most workloads. Current versions
of Kudu default to maintaining a maximum of 10 log segments per tablet for
catching up other peers, and each segment defaults to 64MB, so 100GB of
space should be enough for ~150 tablets at their maximum WAL usage. However
tablets can also retain WALs because they're retaining in-memory data, so
if you are running on machines with large amounts of RAM (256GB) you might
end up getting close to the 100GB space limit in the case that your
workload is extremely insert-heavy.

My best recommendation would be to keep an eye on 'du -sh /path/to/wals'
during your workload, including some scenarios where another cluster node
is restarting, and get a sense of your typical usage.

-Todd
-- 
Todd Lipcon
Software Engineer, Cloudera

Reply via email to