You need to track node membership separately. I do that in a SQL database, but you can use cassandra for that. For example:
rowkey = cluster name column name Composite[ <epoch_time>:<node_name>] = [join|leave] Then every time a node joins or leaves a cluster, write an entry. Then you can just read the row (ordered by epoch times) to build your list of active nodes for a given time period. Note, you can set a ending read range, but you basically have to start reading from 0. Notice that is really for figuring out which nodes are in a cluster for a given period of time. You wouldn't want to model it that way if you wanted to know which cluster(s) a single node was in over a given period of time. In that case you'd model it this way: rowkey = node name column name Composite[ <epoch_time>:<cluster_name>] = [join|leave] Depending on your needs, you may end up using both! On Fri, Aug 10, 2012 at 1:34 AM, <dinesh.simkh...@gridcore.se> wrote: > Thanks Aaron for your reply, > creating vector for raw data is good work around for decreasing disk space, > but I am not still clear tracking time for nodes, say if we want a query like > give me the list of nodes for a cluster between this period of time then how > do we get that information? do we scan through each node row as we will have > row for each node? > > thanks > > -----Aaron Turner <synfina...@gmail.com> wrote: ----- > To: user@cassandra.apache.org > From: Aaron Turner <synfina...@gmail.com> > Date: 08/09/2012 07:38PM > Subject: Re: Cassandra data model help > > On Thu, Aug 9, 2012 at 5:52 AM, <dinesh.simkh...@gridcore.se> wrote: >> Hi, >> I am trying to create a Cassandra schema for cluster monitoring system, >> where one cluster can have multiple nodes and I am monitoring multiple >> matrices from a node. My raw data schema looks like and taking values in >> every 5 min interval >> >> matrix_name + daily time stamp as row key, composite column name of node >> name and time stamp and matrix value as column value >> >> the problem I am facing is a node can go back and forth between the >> clusters(system can have more than one clusters) so if i need monthly >> statistics plotting of a cluster I have to consider the nodes that are >> leaving and joining during this period of time, some node might be part of >> the cluster for just 15 days and some could join the cluster last 10 day of >> month, so to plot data for a particular cluster for a time interval I need >> to know the nodes which were part of that cluster for that period of time, >> what could be the best schema for this solution ? I have tried few ideas so >> far no luck, any suggestions ? > > Store each node stat in it's own row. Then decide if you want to > track when a node joins/leaves a cluster so you can build the aggs on > the fly or just store cluster aggregates in their own row as well. If > the latter, depending on your polling methodology, you may want to use > counters for the cluster aggregates. > > Also, if you're doing 5 min intervals with each row = 1 day, then your > disk space usage is going to grow pretty quickly due to per-column > overhead. You didn't say what the values are that you're storing, > but if they're just 64bit integers or something like that, most of > your disk space is actually being used for column overhead not your > data. > > I worked around this by creating a 2nd CF, where each row = 1 year > worth of data and each column = 1 days worth of data. The values are > just a vector of the 5min values from the original CF. Then I just > have a cron job which reads the previous days data and builds the > vectors in the new CF and then deletes the original row. By doing > this, my disk space requirements (before replication) went from over > 1.1TB/year to 305GB/year. > > > -- > Aaron Turner > http://synfin.net/ Twitter: @synfinatic > http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & > Windows > Those who would give up essential Liberty, to purchase a little temporary > Safety, deserve neither Liberty nor Safety. > -- Benjamin Franklin > "carpe diem quam minimum credula postero" > -- Aaron Turner http://synfin.net/ Twitter: @synfinatic http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & Windows Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety. -- Benjamin Franklin "carpe diem quam minimum credula postero"