On Wed, Apr 21, 2010 at 12:17 PM, Steve Lihn <stevel...@gmail.com> wrote:

> [...]



> Design 1: Each attribute is a super column. Therefore each date is a
> column. So we have:
>
> AAPL -> closingPrice -> { '2010-04-13' : 242, '2010-04-14': 245 }
> AAPL -> volume -> { '2010-04-13' : 10.9m, '2010-04-14': 14.4m }
> etc.
>
I would suggest not using this design, as each query involving an attribute
will pull all dates for that attribute into memory on the server.  i.e.
getting the closingPrice for AAPL on '2010-04-13' would pull all closing
prices for AAPL across all dates into memory.


>
> Design 2: Each date is a super column. Therefore each attribute is a
> column. So we have:
>
> AAPL -> '2010-04-13' -> { closingPrice -> 242, volume -> 10.9m }
> AAPL -> '2010-04-14' -> {closingPrice -> 245, volume -> 14.4m }
> etc.
>
> The date column / superColumn will need Order Perserving Partitioner since
> we are going to do a lot of range queries.


Partitioners split up keys between nodes, the partitioner you use has no
effect on your ability to query columns in a row.


> Examples are:
> Query 1: Give me the data between date1 and date2 for a set of tickers
> (say, the 100 tickers in QQQ).
>
You could use http://wiki.apache.org/cassandra/API#multiget_slice for this.


> Query 2: More often than not, the query is: Give me the data for the max
> available dates (for each ticker) between date1 and date2 in a set of
> tickers.
> (Since not every day is traded, and we only want the most recent data,
> given a range of dates.)
>
A http://wiki.apache.org/cassandra/API#SliceRange allows you to specify
limits and ordering for columns you are slicing.

Reply via email to