Re: Huge number of sstables after adding server to existing cluster

2015-04-04 Thread Anuj Wadehra
We faced compaction issue with SCTS in 2.0.3. Till we upgrade, we added a dummy read every 1000 writes as workaround . Compaction started happenning in Write only heavy loads. Anuj Wadehra From:"graham sanderson" Date:Sun, 5 Apr, 2015 at 9:35 am Subject:Re: Huge number of sstables after addi

Re: Huge number of sstables after adding server to existing cluster

2015-04-04 Thread graham sanderson
I have not thought thru why adding a node would cause this behavior, but https://issues.apache.org/jira/browse/CASSANDRA-8860 https://issues.apache.org/jira/browse/CASSANDRA-8635 relate

Re: Huge number of sstables after adding server to existing cluster

2015-04-04 Thread Mantas Klasavičius
Thanks a lot for all to your responses I should mention we are running 2.1.3 and I have set setcompactionthroughput 0 already nodetool enableautocompaction keyspace table command/bug is new to me I will definitely will try this out and let you know One more thing I wan't to clarify did I unders

Re: Exception while running cassandra stress client

2015-04-04 Thread ankit tyagi
Thanks a lot for helping me out here.. I have one more question about cassandra-stress tool. what exactly is *cluster distribution* in *column distribution specifications *while defining yaml based profile. On Thu, Apr 2, 2015 at 3:03 PM, Abhinav Ranjan wrote: > Hi, > > We too got the same erro

Re: Timeseries analysis using Cassandra and partition by date period

2015-04-04 Thread Serega Sheypak
Okay, so bucketing by day/week/month is a capacity planning stuff and actual questions I want to ask. As as a conclusion: I have a table events CREATE TABLE user_plans ( id timeuuid, user_id timeuuid, event_ts timestamp, event_type int, some_other_attr text PRIMARY KEY (user_id, ends) )

Re: Timeseries analysis using Cassandra and partition by date period

2015-04-04 Thread Jack Krupansky
It sounds like your time bucket should be a month, but it depends on the amount of data per user per day and your main query range. Within the partition you can then query for a range of days. Yes, all of the rows within a partition are stored on one physical node as well as the replica nodes. --

Re: Timeseries analysis using Cassandra and partition by date period

2015-04-04 Thread Serega Sheypak
>non-equal relation on a partition key is not supported Ok, can I generate select query: select some_attributes from events where ymd = 20150101 or ymd = 20150102 or 20150103 ... or 20150331 > The partition key determines which node can satisfy the query So you mean that all rows with the same *(y

Re: Timeseries analysis using Cassandra and partition by date period

2015-04-04 Thread Jack Krupansky
Unfortunately, a non-equal relation on a partition key is not supported. You would need to bucket by some larger unit, like a month, and then use the date/time as a clustering column for the row key. Then you could query within the partition. The partition key determines which node can satisfy the

Re: Timeseries analysis using Cassandra and partition by date period

2015-04-04 Thread Serega Sheypak
Hi, we plan to have 10^8 users and each user could generate 10 events per day. So we have: 10^8 records per day 10^8*30 records per month. Our timewindow analysis could be from 1 to 6 months. Right now PK is PRIMARY KEY (user_id, ends) where endts is exact ts of event. So you suggest this approac

Re: Timeseries analysis using Cassandra and partition by date period

2015-04-04 Thread Jack Krupansky
It depends on the actual number of events per user, but simply bucketing the partition key can give you the same effect - clustering rows by time range. A composite partition key could be comprised of the user name and the date. It also depends on the data rate - is it many events per day or just

Timeseries analysis using Cassandra and partition by date period

2015-04-04 Thread Serega Sheypak
Hi, I switched from HBase to Cassandra and try to find problem solution for timeseries analysis on top Cassandra. I have a entity named "Event". "Event" has attributes: user_id - a guy who triggered event event_ts - when even happened event_type - type of event some_other_attr - some other attrs we