It sounds to me that it's an good idea to use Cassandra in your case, I figure 
I help you as we Europeans need to cooperate some even though I only worked 
with Cassandra for a month. =)

1:
What is the query you want to use when charting the data? Use it to decide how 
to storage and sort your data.
2:
Where is your row? You must model it correctly, I added my explanation here: 
http://x0613.orbbox.com/blog/662/8567/ (http://www.justus.st/)
SCF-ROW-SC-C
Or
CF-ROW-C
3:
There is some limitations:
2GB of data in a row in 0.6, 2 billion columns in 0.7.
And
A row must fit on a node.
4:
For my range-selections - I think I need the OrderPreservingPartitioner. Right?
I don't think you must but sort it by the time of measure. Why you do not need 
to is because you always have an entire row on the same node, 
OrderPreservingPartitioner is regarding Row Keys in order.
You got to check how to sort columns and supercolumns again. I haven't added my 
bookmarks to the blog yet but http://www.sodeso.nl/?p=421
Was a good source for information I think. There is more on the same blog 
aswell.
5:
There is always alternate designs, you should not give up to early as it's the 
most important decisions.
6:
Have a nice day Stefan

/Justus



-----Ursprungligt meddelande-----
Från: Stefan Kaufmann [mailto:sta...@gmail.com] 
Skickat: den 3 augusti 2010 09:21
Till: user@cassandra.apache.org
Ämne: Using Cassandra for storing measurement data

Dear Cassandra Users,

I'm quite new to Cassandra and I'm still trying to figure out, if I'm
on the right path for my requirements.
I like to explain my Cassandra design and hope to receive feedback, if
this would work.

I like to use Cassandra to store measurement data from several
devices. Each device every minute - so there will be about 500 000
Entries per device every year.
Following data has to be stored:
 - device ID
 - measurement Time (of course different to the Cassandra time-stamp)
 - measurement value

Later, the data should be charted - so I need to select time-ranges
from a device.



My solution for is currently a super-column:
{
    name: "device1",
    value: {
        // measurement timestamps..
        1280819205: {name: "value", value: "10", timestamp: 123456789},
        1280819305: {name: "value", value: "15", timestamp: 123456789},
        1280819405: {name: "value", value: "10", timestamp: 123456789},
        //there will be millions of entries
    }
    name: "device2",
    value: {
        // measurement timestamps..
        1280819205: {name: "value", value: "20", timestamp: 123456789},
        1280819305: {name: "value", value: "15", timestamp: 123456789},
        1280819405: {name: "value", value: "20", timestamp: 123456789},
         //there will be millions of entries
    }
}

My questions:
My main concern is the huge amount of subcolumns I'm using. All the
examples of Cassandra in the web I saw, used those to store only a few
columns (like a user profile).
So would this work with millions of entries?

For my range-selections - I think I need the OrderPreservingPartitioner. Right?

Are there alternative designs? Maybe one without a Super-column? I
can't think of one..

I'm looking forward to some answers,
Thanks in advance,
Stefan

Reply via email to