Re: Giant sets of ordered data

David Boxenhorn Wed, 02 Jun 2010 08:57:59 -0700

Let's say you're logging events, and you have billions of events. What if
the events come in bursts, so within a day there are millions of events, but
they all come within microseconds of each other a few times a day? How do
you find the events that happened on a particular day if you can't store
them all in one row?


On Wed, Jun 2, 2010 at 6:45 PM, Jonathan Shook <jsh...@gmail.com> wrote:

> Either OPP by key, or within a row by column name. I'd suggest the latter.
> If you have structured data to stick under a column (named by the
> timestamp), then you can serialize and unserialize it yourself, or you
> can use a supercolumn. It's effectively the same thing.  Cassandra
> only provides the super column support as a convenience layer as it is
> currently implemented. That may change in the future.
>
> You didn't make clear in your question why a standard column would be
> less suitable. I presumed you had layered structure within the
> timestamp, hence my response.
> How would you logically partition your dataset according to natural
> application boundaries? This will answer most of your question.
> If you have a dataset which can't be partitioned into a reasonable
> size row, then you may want to use OPP and key concatenation.
>
> What do you mean by giant?
>
> On Wed, Jun 2, 2010 at 10:32 AM, David Boxenhorn <da...@lookin2.com>
> wrote:
> > How do I handle giant sets of ordered data, e.g. by timestamps, which I
> want
> > to access by range?
> >
> > I can't put all the data into a supercolumn, because it's loaded into
> memory
> > at once, and it's too much data.
> >
> > Am I forced to use an order-preserving partitioner? I don't want the
> > headache. Is there any other way?
> >
>

Re: Giant sets of ordered data

Reply via email to