Hey everyone,
I'm new to Cassandra, going my first steps, having a problem/question regarding
sorting results and proper data modelling.
First of all, I read the article "We Shall Have Order!" by Aaron Ploetz (1) to
get a first view on how Cassandra works.
I reproduced the example in the article with my own table.
DROP TABLE sensors;
CREATE TABLE sensors (
timestamp BIGINT,
name VARCHAR,
value VARCHAR,
unit VARCHAR,
PRIMARY KEY (name, timestamp)
)
WITH gc_grace_seconds = 0
AND CLUSTERING ORDER BY (timestamp DESC);
I'm actual running Cassandra on a single node ([cqlsh 5.0.1 | Cassandra 3.11.0
| CQL spec 3.4.4 | Native protocol v4]).
Now some background information about my project:
I want to store all kinds of measuring-data from all kinds sensors. No matter
if the sensor is measuring a temperature, water flow, or whatever. Sensors
always give a single value. Interpretation has to be done afterwards by the
user.
So in my example, I 'm measuring temperatures of my house which leads me to the
following data:
timestamp name value unit
2017-07-24 14-11-00 entrance-a 20 Celsius
2017-07-24 14-11-04 living-room 24 Celsius
2017-07-24 14-11-07 bath-room 22 Celsius
2017-07-24 14-11-15 bed-room 23 Celsius
2017-07-24 14-11-22 entrance-b 20 Celsius
I'm measuring time-triggered each 15 minutes. In order to have some kind of
start and end for each process, I decided to measure the entrance twice with
different named sensors (entrance a and b). So above is one set of
measuring-data, created by a single process.
I'd say this is just another perfect example of what Aaron Ploetz describes in
his article.
When I query Cassandra the result set is not sorted by timestamp as long as I
won't use the primary key in my WHERE clause.
When I ask myself: "What will I query Cassandra for?" I'm always coming up with
the same typical thoughts:
* LIST all measuring's in a specific timespan ORDERED BY timestamp
ASC/DESC
o Requires ALLOW FILTERING
o Won't be sorted
* LIST all measuring's for a specific sensor ORDERED BY timestamp
ASC/DESC
o Sorted result. OK.
* And stuff the future will bring which I simply don't know now.
So in order to query Cassandra for measuring's in a specific timestamp I can't
find a solid solution. My first idea was:
* Add a column sequence which can be used to bundle a set of measuring's
DROP TABLE sensors;
CREATE TABLE sensors (
timestamp BIGINT,
name VARCHAR,
value VARCHAR,
unit VARCHAR,
sequence INT,
PRIMARY KEY (sequence, timestamp)
)
WITH gc_grace_seconds = 0
AND CLUSTERING ORDER BY (timestamp DESC);
o I won't need to measure the entrance twice
o I can query for a timespan as long as the timespan is within a sequence.
? But when I query a timespan containing more than a single sequence, then the
result set is not correct sorted again
sequence timestamp name value unit
123 2017-07-24 14-11-22 entrance-b 20 Celsius
123 2017-07-24 14-11-15 bed-room 23 Celsius
123 2017-07-24 14-11-07 bath-room 22 Celsius
123 2017-07-24 14-11-04 living-room 24 Celsius
123 2017-07-24 14-11-00 entrance-a 20 Celsius
124 2017-07-24 15-11-22 entrance-b 22 Celsius
124 2017-07-24 15-11-15 bed-room 25 Celsius
124 2017-07-24 15-11-07 bath-room 24 Celsius
124 2017-07-24 15-11-04 living-room 26 Celsius
124 2017-07-24 15-11-00 entrance-a 22 Celsius
o Besides: it's not recommended to use a "dummy" column especially not as
primary or clustering key.
How to solve this problem?
I believe, I can't be the only one who got this requirement. Imho "Sort it on
the client-side" can't be the solution. As soon as data gets bigger we simply
can't "just" sort on a client side.
So my next idea was to use the table as overall data storage and create another
table and periodically transfer data from the main to the child table. But I
believe I'll get the same problem because Cassandra simply don't sort as an
RDBMS. So here must be an idea behind the philosophy of Cassandra.
Can anyone help me out?
Best regards
Mike Wenzel
(1) https://www.datastax.com/dev/blog/we-shall-have-order