How to organize a timeseries by device?

2015-11-09 Thread Guillaume Charhon
Hello, We are currently storing geolocation events (about 1 per 5 minutes) for each device we track. We currently have 2 TB of data. I would like to store the device_id, the timestamp of the event, latitude and longitude. I though about using the device_id as the partition key and timestamp as

Re: How to organize a timeseries by device?

2015-11-09 Thread Jack Krupansky
The general rule in Cassandra data modeling is to look at all of your queries first and then to declare a table for each query, even if that means storing multiple copies of the data. So, create a second table with bucketed time as the partition key (hour, 15 minutes, or whatever time interval

Re: How to organize a timeseries by device?

2015-11-09 Thread Kai Wang
1. Don't make your partition unbound. It's tempting to just use (device_id, timestamp). But soon or later you will have problem when time goes by. You can keep the partition bound by using (device_id, bucket, timestamp). Use hour, day, month or even year like Jack mentioned depending on the size

Re: Can't save Opscenter Dashboard

2015-11-09 Thread Kai Wang
Finally I got this one resolved. I sent a feedback via Help->Feedback on OpsCenter page. Someone is actually reading those - imagine that. Big +1 to Datastax. Here is the fix: first visit this URL: http://your_ip:your_port /Test_Cluster/rc/dashboard_presets/ you should get a response like this:

Re: Best way to recreate a cassandra node with data

2015-11-09 Thread Eric Stevens
Check nodetool status to see if the replacement node is fully joined (UN status). If it is and it didn't stream any data, then either auto_bootstrap was false, or the node was in its own seeds list. If you lost a node, then replace_address as Jonny mentioned would probably be a good idea. On

Re: Cassandra compaction stuck? Should I disable?

2015-11-09 Thread Robert Coli
On Mon, Nov 9, 2015 at 1:29 PM, PenguinWhispererThe . < th3penguinwhispe...@gmail.com> wrote: > > In Opscenter I see one of the nodes is orange. It seems like it's working > on compaction. I used nodetool compactionstats and whenever I did this the > Completed nad percentage stays the same (even

Re: Unable to bootstrap another DC in my cluster

2015-11-09 Thread Robert Coli
On Mon, Nov 9, 2015 at 12:08 PM, K F wrote: > As I am trying to bring up a new DC in my cluster, my first seed node that > I bring-up in the new DC that I am adding to the existing cluster. It's not > able to receive reply back for the GossipDigestSyn request sent to other >

Re: How to organize a timeseries by device?

2015-11-09 Thread Guillaume Charhon
Is it usually recommended to use the bucket key (usually an 5 minutes period in my case) for the table of the events_by_time using a timestamp or a string? On Mon, Nov 9, 2015 at 5:05 PM, Kai Wang wrote: > it depends on the size of each event. You want to bound each partition

Re: Do I have to use the cql in the datastax java driver?

2015-11-09 Thread Robert Coli
On Sun, Nov 8, 2015 at 6:57 AM, Jonathan Haddad wrote: > You shouldn't use thrift, it's effectively dead. > > On Fri, Nov 6, 2015 at 10:30 PM Dikang Gu wrote: > >> Can I still use thrift interface to talk to cassandra? Any reason that we >> should not

Re: How to organize a timeseries by device?

2015-11-09 Thread Kai Wang
bucket key is just like any column of the table, you can use any type as long as it's convenient for you to write the query. But I don't think you should use 5 minute as your bucket key since you only have 1 event every 5 minute. 5-minute bucket seems too small. The bucket key we mentioned is for

Re: Best way to recreate a cassandra node with data

2015-11-09 Thread Robert Coli
On Sun, Nov 8, 2015 at 9:11 PM, John Wong wrote: > If we recreate an instance with the same IP, what is the best way to get > the node up and running with the previous data? Right now I am relying on > backup. > replace_address if you don't mind decreasing unique replica

Re: How to organize a timeseries by device?

2015-11-09 Thread Guillaume Charhon
For the first table: (device_id, timestamp), should I add a bucket even if I know I might have millions of events per device but never billions? On Mon, Nov 9, 2015 at 4:37 PM, Jack Krupansky wrote: > Cassandra is good at two kinds of queries: 1) access a specific row

Re: How to organize a timeseries by device?

2015-11-09 Thread Kai Wang
it depends on the size of each event. You want to bound each partition under ~10MB. In system.log look for entry like: WARN [CompactionExecutor:39] 2015-11-07 17:32:00,019 SSTableWriter.java:240 - Compacting large partition :9f80ce31-b7e7-40c7-b642-f5d03fc320aa (13443863224 bytes) This is

Re: How to organize a timeseries by device?

2015-11-09 Thread Guillaume Charhon
Kai, Jack, On 1., should the bucket be a STRING with a date format or do I have a better option ? For (device_id, bucket, timestamp), did you mean ((device_id, bucket), timestamp) ? On 2., what are the risks of timeout ? I currently have this warning: "Cannot execute this query as it might

Re: How to organize a timeseries by device?

2015-11-09 Thread Jack Krupansky
Cassandra is good at two kinds of queries: 1) access a specific row by a specific key, and 2) Access a slice or consecutive sequence of rows within a given partition. It is recommended to avoid ALLOW FILTERING. If it happens to work well for you, great, go for it, but if it doesn't then simply

Unable to bootstrap another DC in my cluster

2015-11-09 Thread K F
Hi folks, As I am trying to bring up a new DC in my cluster, my first seed node that I bring-up in the new DC that I am adding to the existing cluster. It's not able to receive reply back for the GossipDigestSyn request sent to other seeds in the cluster. This is causing the first node to

org.apache.cassandra.net.OutboundTcpConnection.run(OutboundTcpConnection.java:218 throws java.lang.AssertionError

2015-11-09 Thread 李建奇
Hi, All, We have a 12 nodes cluster with 2.1.9 version for near one month. Last week it have an exception . Cluster’s write and read latency will go up to 4 seconds from 0.4ms average after exception. I suspect OutboundTcpConnection is broken .I try to disablegossip then enablegossip to

Fwd: Cassandra compaction stuck? Should I disable?

2015-11-09 Thread PenguinWhispererThe .
Hi all, In Opscenter I see one of the nodes is orange. It seems like it's working on compaction. I used nodetool compactionstats and whenever I did this the Completed nad percentage stays the same (even with hours in between). I currently don't see cpu load from cassandra on that node. So it

[RELEASE] Apache Cassandra 3.0.0 released

2015-11-09 Thread Jake Luciani
The Cassandra team is pleased to announce the release of Apache Cassandra version 3.0.0. Top Cassandra 3.0 features: * CQL optimized storage engine and sstable format * Materialized views * More efficient hints Read more about features and upgrade instructions in NEWS.txt[2] The Java

Re: Best way to recreate a cassandra node with data

2015-11-09 Thread Johnny Miller
John - Why not just just follow the process for replacing a dead node? Why do you need to use the same IP? e.g. JVM_OPTS="$JVM_OPTS -Dcassandra.replace_address=address_of_dead_node http://docs.datastax.com/en/cassandra/1.2/cassandra/operations/ops_replace_node_t.html

Re: Does nodetool cleanup clears tombstones in the CF?

2015-11-09 Thread Johnny Miller
You could also have a look at the JMX forceUserDefinedCompaction call on a specific SSTable > On 5 Nov 2015, at 21:56, K F wrote: > > Thanks Rob, I will look into checksstablegarbage utility. However, I don't > want to run major compaction as that would result in too big

which astyanax version to use?

2015-11-09 Thread Lu, Boying
Hi, All, We plan to upgrade Cassandra from 2.0.17 to 2.1.11 (the latest stable release recommended to be used in the product environment) in our product. Currently we are using Astyanax 1.56.49 as Java client, I found there are many new Astyanax at https://github.com/Netflix/astyanax/releases