> Date: Thu, 13 Jan 2011 01:29:33 +0100
> Subject: Re: Advice wanted on modeling
> From: [email protected]
> To: [email protected]
>
> > The application will have a large number of records, with the records
> > consisting of a fixed part and a number (n) of periodic parts.
> > * The fixed part is updated occasionally.
> > * The periodic parts are never updated, but a new one is added every 5 to 10
> > minutes. Only the last n periodic parts need to be kept, so that the oldest
> > one can be deleted after adding a new part.
> > * The records will always be read completely (meaning fixed part and all
> > periodic parts). Reads are less frequent than writes.
> > The application will be running continuosly, at least for a few weeks, so
> > there will be many, many stale periodic parts, so I'm a bit worried about
> > data comsumption and compactions.
>
> I was going to hit send on a partial recommendation but realized I
> don't really have enough information given that you seem to be making
> pretty specific optimizations.
>
> You say writes are more frequent than reads. To what extent - are
> reads *very* infrequent to the point that the performance of the reads
> are almost completely irrelevant?
What exactly is a write? Is a record update or is it a batch of record updates
that is executed in one operation? In my case I'm batching about a thousand
record updates (new periodic parts) into a single batch_mutate. A read would
constitute fetching all parts of a single record. In the text below I'm using
the
term update to mean a record update.
I expect about a few reads typically for every thousand updates (<1%), although
read pressure will vary considerably over time. I don't expect more than a
hundred
reads for every thousand updates (about 10%). Read performance is not
irrelevant,
but definitely subordinate to write performance, which is crucial (and one of
the
reasons I selected Cassandra).
> You seem worried about tombstones and data size. Is the issue that
> you're expecting huge amounts of data and disk space/compaction
> frequency is an issue?
Yes, I am expecting huge amounts of data and without compaction I would
soon (few days to a week) run out of disk space.
> Are you expecting write load to be high such that performance of
> writes (and compaction) is a concern, or is it mostly about slowly
> building up huge amounts of data that you want to be compact on disk?
I'm not sure here. My write load is high, estimated at a thousand records
per second (batched, of course).