Re: data distribution along column family partitions

Chris Lohfink Wed, 04 Feb 2015 07:53:09 -0800

> What about 15 gb?

not ok :) don't let a single partition get to 1gb, 100's of mb should be
when flares are going up. The main reasoning is compactions would be
horrifically slow and there will be a lot of gc pain. Bringing the time
bucket to be by day will probably be sufficient. It would take billions of
alarm events in single time bucket if thats entire data payload to get that
bad.


> If I use paging, Cassandra won't try to allocate the whole partition on
the server node, it will just allocate memory in the heap for that page.
Check?

Cassandra should never allocate an entire (large/wide) partition into
memory unless your telling it to on a read. (gross simplification coming up
here) Can think of it as if more as if its streaming the partitions data
from disk (more or less) filling a response to your query. Don't ask for
1gb of data and you won't get 1gb objects in your heap. Wide rows work
well, the keeping them smaller is an optimization that will save you a lot
of pain down the road from troublesome jvm gcs, slower compactions,
unbalanced nodes, and higher read latencies.

Chris

On Wed, Feb 4, 2015 at 9:33 AM, Marcelo Valle (BLOOMBERG/ LONDON) <
mvallemil...@bloomberg.net> wrote:

> > The data model lgtm. You may need to balance the size of the time
> buckets with the amount of alarms to prevent partitions from getting too
> large. 1
> month may be a little large, I would aim to keep the partitions below 25mb
> (can check with nodetool cfstats) or so in size to keep everything happy.
> Its ok if occasional ones go larger, something like 1gb can be bad.. but it
> would still work if not very efficiently.
>
> What about 15 gb?
>
> > Deletes on an entire time-bucket at a time seems like a good approach,
> but just setting TTL would be far far better imho (why not just set it to
> two years?). May want to look into new DateTieredCompactionStrategy, or
> LeveledCompactionStrategy or the obsoleted data will very rarely go away.
>
> Excellent hint, I will take a good look on this. I didn't know
> DateTieredCompactionStrategy
>
> > When reading just be sure to use paging (the good cql drivers will have
> it built in) and don't actually read it all in one massive query. If you
> decrease size of your time bucket you may end up having to page the query
> across multiple partitions if Y-X > bucket size.
>
> If I use paging, Cassandra won't try to allocate the whole partition on
> the server node, it will just allocate memory in the heap for that page.
> Check?
>
> Marcelo Valle
>
> From: user@cassandra.apache.org
> Subject: Re: data distribution along column family partitions
>
> The data model lgtm.  You may need to balance the size of the time buckets
> with the amount of alarms to prevent partitions from getting too large.  1
> month may be a little large, I would aim to keep the partitions below 25mb
> (can check with nodetool cfstats) or so in size to keep everything
> happy.  Its ok if occasional ones go larger, something like 1gb can be
> bad.. but it would still work if not very efficiently.
>
> Deletes on an entire time-bucket at a time seems like a good approach, but
> just setting TTL would be far far better imho (why not just set it to two
> years?).  May want to look into new DateTieredCompactionStrategy, or
> LeveledCompactionStrategy or the obsoleted data will very rarely go away.
>
> When reading just be sure to use paging (the good cql drivers will have it
> built in) and don't actually read it all in one massive query.  If you
> decrease size of your time bucket you may end up having to page the query
> across multiple partitions if Y-X > bucket size.
>
> Chris
>
> On Wed, Feb 4, 2015 at 4:34 AM, Marcelo Elias Del Valle <
> mvall...@gmail.com> wrote:
>
>> Hello,
>>
>> I am designing a model to store alerts users receive over time. I will
>> want to store probably the last two years of alerts for each user.
>>
>> The first thought I had was having a column family partitioned by user +
>> timebucket, where time bucket could be something like year + month. For
>> instance:
>>
>> *partition key:*
>> user-id = f47ac10b-58cc-*4*372-*a*567-0e02b2c3d479
>> time-bucket = 201502
>> *rest of primary key:*
>> timestamp = column of tipy timestamp
>> alert id = f47ac10b-58cc-*4*372-*a*567-0e02b2c3d480
>>
>> Question, would this make it easier to delete old data? Supposing I am
>> not using TTL and I want to remove alerts older than 2 years, what would be
>> better, just deleting the entire time-bucket for each user-id (through a
>> map/reduce process) or having just user-id as partition key and deleting,
>> for each user, where X > timestamp > Y?
>>
>> Is it the same for Cassandra, internally?
>>
>> Another question is: would data be distributed enough if I just choose to
>> partition by user-id? I will have some users with a large number of alerts,
>> but in average I could consider alerts would have a good distribution along
>> user ids. The problem is I don't fell confident having few partitions with
>> A LOT of alerts would not be a problem at read time.
>>
>> What happens at read time when I try to read data from a big partition?
>> Like, I want to read alerts for a user where X > timestamp > Y, but it
>> would return 1 million alerts. As it's all in a single partition, this read
>> will occur in the same node, thus allocating a lot of memory for this
>> single operation, right?
>>
>> What if the memory needed for this operation is bigger than it fits in
>> java heap? Would this be a problem to Cassandra?
>>
>>
>> Best regards,
>> --
>> Marcelo Elias Del Valle
>> http://mvalle.com - @mvallebr
>>
>>
>

Re: data distribution along column family partitions

Reply via email to