Hi Junaid,
I wrote a blog post a few months ago on massively scalable time series, going
into a couple techniques on bucketing that you might find helpful.
http://thelastpickle.com/blog/2017/08/02/time-series-data-modeling-massive-scale.html
Hi Junaid,
Using a "bucketing" key ("day") is the recommended way to limit the size of
partitions. In your case you would probably need something like:
PRIMARY KEY ((deviceid,
day), datetime).
Have you considered computing a running aggregate as the data comes into
Cassandra? Rather than execute
We are building a IoT platform where time series data from millions of devices
is to be collected and then used to do some analytics pertaining to Business
Intelligence/Analytics (BI/BA).
Within the above context, we are running into the issue of have range based
queries, where the granularity of
as the primary key of the table.
>> But, I realized that this may cause really wide rows ( tracking for 24
>> hours means 96 records inserted (1 for each 15 min window), over 1 year
>> this means 36k records per user, over 2 years, 72k, etc).
>>
>> I know the limit of wide rows is billions of records, but I've heard
>> that the practical limit is much lower.
>>
>> So I considered using a composite primary key: (user, timestamp)
>>
>> If I'm correct, the above should create a new row for each user &
>> timestamp logged.
>>
>> However, will i still be able to do range queries on the timestamp, to
>> e.g return the data for the last week?
>>
>> E.g select * from data where user_id = 'foo' and timestamp >= '<1 month
>> ago>' and timestamp <= '' ?
>>
>>
>>
over 2 years, 72k, etc).
>
> I know the limit of wide rows is billions of records, but I've heard that
> the practical limit is much lower.
>
> So I considered using a composite primary key: (user, timestamp)
>
> If I'm correct, the above should create a new row for each user &
> timestamp logged.
>
> However, will i still be able to do range queries on the timestamp, to e.g
> return the data for the last week?
>
> E.g select * from data where user_id = 'foo' and timestamp >= '<1 month
> ago>' and timestamp <= '' ?
>
>
>
that
> the practical limit is much lower.
>
> So I considered using a composite primary key: (user, timestamp)
>
> If I'm correct, the above should create a new row for each user &
> timestamp logged.
>
> However, will i still be able to do range queries on the timestam
heard that
> the practical limit is much lower.
>
> So I considered using a composite primary key: (user, timestamp)
>
> If I'm correct, the above should create a new row for each user & timestamp
> logged.
>
> However, will i still be able to do range queries on the
know the limit of wide rows is billions of records, but I've heard that
the practical limit is much lower.
So I considered using a composite primary key: (user, timestamp)
If I'm correct, the above should create a new row for each user & timestamp
logged.
However, will i still be able to do r
.
Cheers,
Jens
—
Sent from Mailbox https://www.dropbox.com/mailbox
On Tue, Jun 24, 2014 at 10:09 AM, Mike Carter jaloos...@gmail.com wrote:
Hello!
I'm a beginner in C* and I'm quite struggling with it.
I’d like to measure the performance of some Cassandra-Range-Queries. The
idea
-Range-Queries. The
idea is to execute multidimensional range-queries on Cassandra. E.g. there
is a given table of 1million rows with 10 columns and I like to execute
some queries like “select count(*) from testable where d=1 and v110 and v2
20 and v3 45 and v470 … allow filtering”. This kind
with it.
I’d like to measure the performance of some Cassandra-Range-Queries. The
idea is to execute multidimensional range-queries on Cassandra. E.g. there
is a given table of 1million rows with 10 columns and I like to execute
some queries like “select count(*) from testable where d=1 and v110 and v2
Hello!
I'm a beginner in C* and I'm quite struggling with it.
I’d like to measure the performance of some Cassandra-Range-Queries. The
idea is to execute multidimensional range-queries on Cassandra. E.g. there
is a given table of 1million rows with 10 columns and I like to execute
some queries
of some Cassandra-Range-Queries. The
idea is to execute multidimensional range-queries on Cassandra. E.g. there
is a given table of 1million rows with 10 columns and I like to execute
some queries like “select count(*) from testable where d=1 and v110 and v2
20 and v3 45 and v470 … allow
or NoSQL). Remember
every search request on secondary indexes will be passed on each node in ring.
-Vivek
On Tue, Aug 27, 2013 at 11:11 PM, Sávio Teles savio.te...@lupa.inf.ufg.br
wrote:
Use a database that is designed for efficient range queries? ;D
Is there no way to do
(irrespective of RDBMS or NoSQL).
Remember every search request on secondary indexes will be passed on each
node in ring.
-Vivek
On Tue, Aug 27, 2013 at 11:11 PM, Sávio Teles
savio.te...@lupa.inf.ufg.br wrote:
Use a database that is designed for efficient range queries? ;D
Is there no way to do
that is designed for efficient range queries? ;D
Is there no way to do this with Cassandra? Like using Hive, Sorl...
2013/8/27 Robert Coli rc...@eventbrite.com
On Fri, Aug 23, 2013 at 5:53 AM, Sávio Teles savio.te...@lupa.inf.ufg.br
wrote:
I need to perform range query efficiently
request on secondary indexes will be passed on each node in
ring.
-Vivek
On Tue, Aug 27, 2013 at 11:11 PM, Sávio Teles
savio.te...@lupa.inf.ufg.br wrote:
Use a database that is designed for efficient range queries? ;D
Is there no way to do this with Cassandra? Like using Hive, Sorl
search request on secondary indexes will be passed on each
node in ring.
-Vivek
On Tue, Aug 27, 2013 at 11:11 PM, Sávio Teles
savio.te...@lupa.inf.ufg.br wrote:
Use a database that is designed for efficient range queries? ;D
Is there no way to do this with Cassandra? Like using Hive, Sorl
We have 700.000 rows.
I've indexed salary,age and gender attrs.
Take about 20 minutes.
2013/8/27 Alain RODRIGUEZ arodr...@gmail.com
Can you send us the result of a describe columnfamily users ?
How many rows are presents in this table ?
Do you have indexes defined ?
What is a long time
On Fri, Aug 23, 2013 at 5:53 AM, Sávio Teles savio.te...@lupa.inf.ufg.brwrote:
I need to perform range query efficiently.
...
This query takes a long time to run. Any ideas to perform it efficiently?
Use a database that is designed for efficient range queries? ;D
=Rob
Use a database that is designed for efficient range queries? ;D
Is there no way to do this with Cassandra? Like using Hive, Sorl...
2013/8/27 Robert Coli rc...@eventbrite.com
On Fri, Aug 23, 2013 at 5:53 AM, Sávio Teles
savio.te...@lupa.inf.ufg.brwrote:
I need to perform range query
savio.te...@lupa.inf.ufg.brwrote:
Use a database that is designed for efficient range queries? ;D
Is there no way to do this with Cassandra? Like using Hive, Sorl...
2013/8/27 Robert Coli rc...@eventbrite.com
On Fri, Aug 23, 2013 at 5:53 AM, Sávio Teles savio.te...@lupa.inf.ufg.br
wrote:
I
on secondary indexes will be passed on each node in
ring.
-Vivek
On Tue, Aug 27, 2013 at 11:11 PM, Sávio Teles savio.te...@lupa.inf.ufg.br
wrote:
Use a database that is designed for efficient range queries? ;D
Is there no way to do this with Cassandra? Like using Hive, Sorl...
2013/8/27 Robert
Teles
savio.te...@lupa.inf.ufg.br wrote:
Use a database that is designed for efficient range queries? ;D
Is there no way to do this with Cassandra? Like using Hive, Sorl...
2013/8/27 Robert Coli rc...@eventbrite.com
On Fri, Aug 23, 2013 at 5:53 AM, Sávio Teles
savio.te
Ops, inverted index*!
2013/8/26 Sávio Teles savio.te...@lupa.inf.ufg.br
Do I Have to use revert index to optimize range query operation?
2013/8/23 Sávio Teles savio.te...@lupa.inf.ufg.br
I need to perform range query efficiently. I have the table like:
users
---
user_id | age |
I need to perform range query efficiently. I have the table like:
users
---
user_id | age | gender | salary | ...
The attr user_id is the PRIMARY KEY.
Example of querying:
select * from users where user_id = '*x*' and age *y *and age *z* and
salary *a* and salary *b *and age='M';
(which is directly related to range
queries, too).
On Wed, Jun 26, 2013 at 3:05 AM, Colin Blower cblo...@barracuda.com wrote:
You could just separate the history data from the current data. Then
when the user's result is updated, just write into two tables.
CREATE TABLE all_answers (
user_id
You could just separate the history data from the current data. Then
when the user's result is updated, just write into two tables.
CREATE TABLE all_answers (
user_id uuid,
created timeuuid,
result text,
question_id varint,
PRIMARY KEY (user_id, created)
)
CREATE TABLE current_answers
Yes, that makes sense and that article helped a lot, but I still have a few
questions...
The created_at in our answers table is basically used as a version id.
When a user updates his answer, we don't overwrite the old answer, but
rather insert a new answer with a more recent timestamp (the
Hello,
We are considering using Cassandra and I want to make sure our use case
fits Cassandra's strengths. We have the table like:
answers
---
user_id | question_id | result | created_at
Where our most common query will be something like:
SELECT * FROM answers WHERE user_id = 123 AND
I think you'd just be better served with just a little different primary
key.
If your primary key was (user_id, created_at) or (user_id, created_at,
question_id), then you'd be able to run the above query without a problem.
This will mean that the entire pantheon of a specific user_id will be
Interesting, thank you for the reply.
Two questions though...
Why should created_at come before question_id in the primary key? In other
words, why (user_id, created_at, question_id) instead of (user_id,
question_id, created_at)?
Given this setup, all a user's answers (all 10k) will be stored
So, if you want to grab by the created_at and occasionally limit by
question id, that is why you'd use created_at.
The way the primary keys work is the first part of the primary key is the
Partioner key, that field is what essentially is the single cassandra row.
The second key is the order
AND
replicate_on_write='true' AND
populate_io_cache_on_flush='false' AND
compaction={'class': 'SizeTieredCompactionStrategy'} AND
compression={'sstable_compression': 'SnappyCompressor'};
CREATE INDEX temp_min_update_idx ON temp (min_update);
Range queries are working fine on primary key
(min_update);
Range queries are working fine on primary key.
I am getting the same error on another query of an another table temp2:
select * from temp2 where reffering_url='Some URL';
this table is also having the secondary index on this field(reffering_url)
Any help would be appreciated
AND
read_repair_chance=0.10 AND
replicate_on_write='true' AND
populate_io_cache_on_flush='false' AND
compaction={'class': 'SizeTieredCompactionStrategy'} AND
compression={'sstable_compression': 'SnappyCompressor'};
CREATE INDEX temp_min_update_idx ON temp (min_update);
Range queries
Hello,
I have what is perhaps a silly question.
Column family other2 which has a varchar as primary key and a uuid column.
I have inserted 2000 rows
All rows keys start with 'nl' followed by other characters.
To my surprise when I do : select count(*) from other2 where key 'z';
It shows :
What am I doing wrong here?
You are probably using a RandomPartitioner (or Murmur3Partitioner) which
randomize keys to avoid hot spots.
Basically, you just can't use range query because 'nlxx' is stored
as md5('nlxx'). You should better modify your model to use column
slice,
Things you can find searching on the web :
http://wiki.apache.org/cassandra/DataModel#Range_queries
2013/2/27 Alain RODRIGUEZ arodr...@gmail.com
What am I doing wrong here?
You are probably using a RandomPartitioner (or Murmur3Partitioner) which
randomize keys to avoid hot spots.
So you're basically saying that read consistency levels do affect range
queries? With high enough level I should be able to get the 'correct' data
regardless of some node(s) being behind-the-times?
My first read through https://issues.apache.org/jira/browse/CASSANDRA-967
left me with impression
Hi,
I'm somewhat lost in regards to the results I can expect from running range
queries in a (temporarily) 'inconsistent' cluster (e.g. if node has been
down for some time and hasn't caught up yet).
Suppose I have 4 nodes in 2 DCs (cassandra 1.1.7):
DCa: a1 and a2
DCb: b1 and b2
I'm using
Range queries do not currently read repair, although there is a ticket
on this. If you want them to be consistent do them at QUORUM, or all.
But in a strange quirk since get_range_slice does not repair those
operations are not eventually consistent
On Thu, Feb 7, 2013 at 10:20 AM, Sergey Olefir
AM, Ravikumar Govindarajan
ravikumar.govindara...@gmail.com wrote:
Usually we do a SELECT * FROM ORDER BY LIMIT 26,25 for pagination
purpose, but specifying offset is not available for range queries in
cassandra.
I always have to specify a start-key to achieve
not execute for
arbitrary lengths of time.
On Thu, Nov 15, 2012 at 6:39 AM, Ravikumar Govindarajan
ravikumar.govindara...@gmail.com wrote:
Usually we do a SELECT * FROM ORDER BY LIMIT 26,25 for pagination
purpose, but specifying offset is not available for range queries in
cassandra.
I
offset is not available for range queries in
cassandra.
I always have to specify a start-key to achieve this. Are there reasons
for
choosing such an approach rather than providing an absolute offset?
--
Ravi
ok sorry I thought columns inside a row had their keys hashed also
So they are just putted as raw bytes
thx
2012/6/1 aaron morton aa...@thelastpickle.com
If you hash 4 composite keys, let's say
('A','B','C'), ('A','D','C'), ('A','E','X'), ('A','R','X'), you have only 4
hashes or you have
Thx for the answer
1 more thing, a Composite key is not hashed only once I guess?
It's hashed the number of part the composite have?
So this means there are twice or 3 or ... as many keys as for normal column
keys, is it true?
Le 31 mai 2012 02:59, aaron morton aa...@thelastpickle.com a écrit :
it is hashed once.
To the partitioner it's just some bytes. Other parts of the code car about it's
structure.
Cheers
-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com
On 31/05/2012, at 7:00 PM, Cyril Auburtin wrote:
Thx for the answer
1 more
but sorry, I dont undertand
If you hash 4 composite keys, let's say
('A','B','C'), ('A','D','C'), ('A','E','X'), ('A','R','X'), you have only 4
hashes or you have more?
If it's 4, how come you are able to range query for example between
start_column=('A', 'D') and end_column=('A','E') and get
If you hash 4 composite keys, let's say ('A','B','C'), ('A','D','C'),
('A','E','X'), ('A','R','X'), you have only 4 hashes or you have more?
Four
If it's 4, how come you are able to range query for example between
start_column=('A', 'D') and end_column=('A','E') and get this column
Composite Columns compare each part in turn, so the values are ordered as
you've shown them.
However the rows are not ordered according to key value. They are ordered using
the random token generated by the partitioner see
http://wiki.apache.org/cassandra/FAQ#range_rp
What is the real
How is it done in Cassandra to be able to range query on a composite key?
key1 = (A:A:C), (A:B:C), (A:C:C), (A:D:C), (B,A,C)
like get_range (key1, start_column=(A,), end_column=(A, C)); will return
[ (A:B:C), (A:C:C) ] (in pycassa)
I mean does the composite implementation add much overhead to
the random partitioner for data partitioner.
Actually what I am trying to achieve is to have the range queries using
Random partioner. I am stick and not getting any help to do it.
I created my own column family using Column Familes as Indexes. It is
written in page 14 of Apache Cassandra
is the
tweet_id
-Jeremiah
From: John Laban [j...@pagerduty.com]
Sent: Wednesday, March 14, 2012 12:37 PM
To: user@cassandra.apache.org
Subject: Re: Composite keys and range queries
Hmm, now I'm really confused.
This may be of use to you
http://www.datastax.com/dev/blog/schema
use range queries on composite row
keys, even when using a RandomPartitioner, if I make sure that the first
part of the composite key is fixed?
Any help would be appreciated,
John
On Tue, Mar 13, 2012 at 12:15 PM, John Laban j...@pagerduty.com wrote:
Hi,
I have a column family
, and your priority is the
tweet_id
-Jeremiah
From: John Laban [j...@pagerduty.com]
Sent: Wednesday, March 14, 2012 12:37 PM
To: user@cassandra.apache.org
Subject: Re: Composite keys and range queries
Hmm, now I'm really confused.
This may be of use to you
http
priority is
the tweet_id
-Jeremiah
--
*From:* John Laban [j...@pagerduty.com]
*Sent:* Wednesday, March 14, 2012 12:37 PM
*To:* user@cassandra.apache.org
*Subject:* Re: Composite keys and range queries
Hmm, now I'm really confused.
This may be of use
Forwarding to the Cassandra mailing list as well, in case this is more of
an issue on how I'm using Cassandra.
Am I correct to assume that I can use range queries on composite row keys,
even when using a RandomPartitioner, if I make sure that the first part of
the composite key is fixed?
Any
The digest is based on the results of the same query as applied on
different replicas. See the following for more details:
http://wiki.apache.org/cassandra/ReadRepair
http://www.datastax.com/docs/1.0/dml/data_consistency
On Wed, Nov 30, 2011 at 11:38 PM, Thorsten von Eicken
t...@rightscale.com
Looking at the docs, I can't conclusively answer this question:
Suppose I make this CQL query with consistency factor 1 and read-repair
100%:
select 'a'..'z' from cf where key = 'xyz' limit 5;
Suppose the node I connect to has the key and responds with (improvised
syntax):
['a'-0, 'c'-2, 'e'-4,
Following your suggestions, of using key of super column as range token
won't I have a storage problem?
You won't get me to proclaim that you won't have a storage problem ;)
If you're going to deploy this at scale, I'm sure you'll have problems
whatever you do...
I couldn't find information
I found this on the wiki, may be usefulhttp://wiki.apache.org/cassandra/LargeDataSetConsiderationsAaronOn 24 Jan, 2011,at 09:26 PM, Peter Schuller peter.schul...@infidyne.com wrote: Following your suggestions, of using key of super column as range token
won't I have a storage problem?
You won't
Hector as the client but
would gladly go thrift is necessary.
Range queries on keys is possible when using the order preserving
partitioner; see the partitioner section of:
http://wiki.apache.org/cassandra/StorageConfiguration
In addition, range slicing is supported within a single row
Hello everyone,
I'm new to dynamo. I'm looking to implement something similar to prefix
search for keys (much like S3 allows you to list all the keys that match a
certain prefix).
Can I implement this with Cassandra? I'm using Hector as the client but
would gladly go thrift is necessary.
Thank
Is it possible to perform paginated queries using Random Partitioner in 0.7
with Super Column Families whose Super Columns are UUID's? I don't believe
it is, based on this article:
http://ria101.wordpress.com/2010/02/22/cassandra-randompartitioner-vs-orderpreservingpartitioner,
and my attempts
The Partitioner applies to the row keys, not the columns. Their order is determined by the compare_with and compare_subcolumns_with CF settingsSo where you say "get the last 25 inserts for a key" I'm translating that into "get the most recent 25 super columns for a row, where the super column
Hey Aaron,
Yes, in regards to SCF definition, you are correct:
name: Sensor
column_type: Super
compare_with: TimeUUIDType
gc_grace_seconds: 864000
keys_cached: 1.0
read_repair_chance: 1.0
rows_cached: 0.0
I'm not quite sure I follow you, though,
When you say "I want to get rows starting from a Super Column..." it's a bit confusing. Do you want to get super columns from a single row, or multiple rows? I'm assuming you are talking about getting columns from a single row / key as that's what your code does.For the pelops code, it looks OK
Actually, it was a class issue at this line:
System.*out*.println(NAME: + UUID.*nameUUIDFromBytes*(col.getName()));
The native Pelops class timeUuidHelper is what should be used.
On Wed, Dec 1, 2010 at 4:16 PM, Aaron Morton aa...@thelastpickle.comwrote:
When you say I want to get rows
Using the methods on the Bytes class would be preferable. The byte[]
related methods on UuidHelper should have been deprecated with the Bytes
class was introduced...
e.g. new Bytes(col.getName()).toUuid()
Cheers,
Dan
On Thu, Dec 2, 2010 at 10:26 AM, Frank LoVecchio fr...@isidorey.com wrote:
Hi,
I am trying to iterate over the entire dataset to calculate some
information. Now the way I am trying to do this is by going directly to the
node that has a data range, so here is the route I am following
- get TokenRange using - describe_ring
- then for each tokenRange pick a node and
Hi,
I've used range queries for Order Preserving Partition and got the
satisfactory results.
For instance, I can find first 1 million keys that starts with key
'2008010100' and ends with '2008010200'.
Now I'm trying to do the same with Random Partitioning. But here I find that
for Range
72 matches
Mail list logo