Let's say you're logging events, and you have billions of events. What if the events come in bursts, so within a day there are millions of events, but they all come within microseconds of each other a few times a day? How do you find the events that happened on a particular day if you can't store them all in one row?
On Wed, Jun 2, 2010 at 6:45 PM, Jonathan Shook <jsh...@gmail.com> wrote: > Either OPP by key, or within a row by column name. I'd suggest the latter. > If you have structured data to stick under a column (named by the > timestamp), then you can serialize and unserialize it yourself, or you > can use a supercolumn. It's effectively the same thing. Cassandra > only provides the super column support as a convenience layer as it is > currently implemented. That may change in the future. > > You didn't make clear in your question why a standard column would be > less suitable. I presumed you had layered structure within the > timestamp, hence my response. > How would you logically partition your dataset according to natural > application boundaries? This will answer most of your question. > If you have a dataset which can't be partitioned into a reasonable > size row, then you may want to use OPP and key concatenation. > > What do you mean by giant? > > On Wed, Jun 2, 2010 at 10:32 AM, David Boxenhorn <da...@lookin2.com> > wrote: > > How do I handle giant sets of ordered data, e.g. by timestamps, which I > want > > to access by range? > > > > I can't put all the data into a supercolumn, because it's loaded into > memory > > at once, and it's too much data. > > > > Am I forced to use an order-preserving partitioner? I don't want the > > headache. Is there any other way? > > >