Re: How to measure the write amplification of C*?

Jeff Ferland Thu, 10 Mar 2016 10:11:07 -0800

Compaction logs show the number of bytes written and the level written to.
Base write load = table flushed to L0.
Write amplification = sum of all compactions written to disk for the table.


On Thu, Mar 10, 2016 at 9:44 AM, Dikang Gu <dikan...@gmail.com> wrote:

> Hi Matt,
>
> Thanks for the detailed explanation! Yes, this is exactly what I'm looking
> for, "write amplification = data written to flash/data written by the
> host".
>
> We are heavily using the LCS in production, so I'd like to figure out the
> amplification caused by that and see what we can do to optimize it. I have
> the metrics of "data written to flash", and I'm wondering is there an
> easy way to get the "data written by the host" on each C* node?
>
> Thanks
>
> On Thu, Mar 10, 2016 at 8:48 AM, Matt Kennedy <mkenn...@datastax.com>
> wrote:
>
>> TL;DR - Cassandra actually causes a ton of write amplification but it
>> doesn't freaking matter any more. Read on for details...
>>
>> That slide deck does have a lot of very good information on it, but
>> unfortunately I think it has led to a fundamental misunderstanding about
>> Cassandra and write amplification. In particular, slide 51 vastly
>> oversimplifies the situation.
>>
>> The wikipedia definition of write amplification looks at this from the
>> perspective of the SSD controller:
>> https://en.wikipedia.org/wiki/Write_amplification#Calculating_the_value
>>
>> In short, write amplification = data written to flash/data written by the
>> host
>>
>> So, if I write 1MB in my application, but the SSD has to write my 1MB,
>> plus rearrange another 1MB of data in order to make room for it, then I've
>> written a total of 2MB and my write amplification is 2x.
>>
>> In other words, it is measuring how much extra the SSD controller has to
>> write in order to do its own housekeeping.
>>
>> However, the wikipedia definition is a bit more constrained than how the
>> term is used in the storage industry. The whole point of looking at write
>> amplification is to understand the impact that a particular workload is
>> going to have on the underlying NAND by virtue of the data written. So a
>> definition of write amplification that is a little more relevant to the
>> context of Cassandra is to consider this:
>>
>> write amplification = data written to flash/data written to the database
>>
>> So, while the fact that we only sequentially write large immutable
>> SSTables does in fact mean that controller-level write amplification is
>> near zero, Compaction comes along and completely destroys that tidy little
>> story. Think about it, every time a compaction re-writes data that has
>> already been written, we are creating a lot of application-level write
>> amplification. Different compaction strategies and the workload itself
>> impact what the real application-level write amp is, but generally
>> speaking, LCS is the worst, followed by STCS and DTCS will cause the least
>> write-amp. To measure this, you can usually use smartctl (may be another
>> mechanism depending on SSD manufacturer) to get the physical bytes written
>> to your SSDs and divide that by the data that you've actually logically
>> written to Cassandra. I've measured (more than two years ago) LCS write amp
>> as high as 50x on some workloads, which is significantly higher than the
>> typical controller level write amp on a b-tree style update-in-place data
>> store. Also note that the new storage engine in general reduces a lot of
>> inefficiency in the Cassandra storage engine therefore reducing the impact
>> of write amp due to compactions.
>>
>> However, if you're a person that understands SSDs, at this point you're
>> wondering why we aren't burning out SSDs right and left. The reality is
>> that general SSD endurance has gotten so good, that all this write amp
>> isn't really a problem any more. If you're curious to read more about that,
>> I recommend you start here:
>>
>>
>> http://hothardware.com/news/google-data-center-ssd-research-report-offers-surprising-results-slc-not-more-reliable-than-mlc-flash
>>
>> and the paper that article mentions:
>>
>> http://0b4af6cdc2f0c5998459-c0245c5c937c5dedcca3f1764ecc9b2f.r43.cf2.rackcdn.com/23105-fast16-papers-schroeder.pdf
>>
>>
>> Hope this helps.
>>
>>
>> Matt Kennedy
>>
>>
>>
>> On Thu, Mar 10, 2016 at 7:05 AM, Paulo Motta <pauloricard...@gmail.com>
>> wrote:
>>
>>> This is a good source on Cassandra + write amplification:
>>> http://www.slideshare.net/rbranson/cassandra-and-solid-state-drives
>>>
>>> 2016-03-10 9:57 GMT-03:00 Benjamin Lerer <benjamin.le...@datastax.com>:
>>>
>>>> Cassandra should not cause any write amplification. Write amplification
>>>> appends only when you updates data on SSDs. Cassandra does not update
>>>> any
>>>> data in place. Data can be rewritten during compaction but it is never
>>>> updated.
>>>>
>>>> Benjamin
>>>>
>>>> On Thu, Mar 10, 2016 at 12:42 PM, Alain RODRIGUEZ <arodr...@gmail.com>
>>>> wrote:
>>>>
>>>> > Hi Dikang,
>>>> >
>>>> > I am not sure about what you call "amplification", but as sizes highly
>>>> > depends on the structure I think I would probably give it a try using
>>>> CCM (
>>>> > https://github.com/pcmanus/ccm) or some test cluster with 'production
>>>> > like'
>>>> > setting and schema. You can write a row, flush it and see how big is
>>>> the
>>>> > data cluster-wide / per node.
>>>> >
>>>> > Hope this will be of some help.
>>>> >
>>>> > C*heers,
>>>> > -----------------------
>>>> > Alain Rodriguez - al...@thelastpickle.com
>>>> > France
>>>> >
>>>> > The Last Pickle - Apache Cassandra Consulting
>>>> > http://www.thelastpickle.com
>>>> >
>>>> > 2016-03-10 7:18 GMT+01:00 Dikang Gu <dikan...@gmail.com>:
>>>> >
>>>> > > Hello there,
>>>> > >
>>>> > > I'm wondering is there a good way to measure the write
>>>> amplification of
>>>> > > Cassandra?
>>>> > >
>>>> > > I'm thinking it could be calculated by (size of mutations written
>>>> to the
>>>> > > node)/(number of bytes written to the disk).
>>>> > >
>>>> > > Do we already have the metrics of "size of mutations written to the
>>>> > node"?
>>>> > > I did not find it in jmx metrics.
>>>> > >
>>>> > > Thanks
>>>> > >
>>>> > > --
>>>> > > Dikang
>>>> > >
>>>> > >
>>>> >
>>>>
>>>
>>>
>>
>
>
> --
> Dikang
>
>

Re: How to measure the write amplification of C*?

Reply via email to