You need to track node membership separately.  I do that in a SQL
database, but you can use cassandra for that.  For example:

rowkey = cluster name
column name  Composite[ <epoch_time>:<node_name>] = [join|leave]

Then every time a node joins or leaves a cluster, write an entry.
Then you can just read the row (ordered by epoch times) to build your
list of active nodes for a given time period.  Note, you can set a
ending read range, but you basically have to start reading from 0.

Notice that is really for figuring out which nodes are in a cluster
for a given period of time.  You wouldn't want to model it that way if
you wanted to know which cluster(s) a single node was  in over a given
period of time.  In that case you'd model it this way:

rowkey = node name
column name  Composite[ <epoch_time>:<cluster_name>] = [join|leave]

Depending on your needs, you may end up using both!



On Fri, Aug 10, 2012 at 1:34 AM,  <dinesh.simkh...@gridcore.se> wrote:
> Thanks Aaron for your reply,
> creating vector for raw data is good work around for decreasing disk space, 
> but I am not still clear tracking time for nodes, say if we want a query like 
> give me the list of nodes for a cluster between this period of time then how 
> do we get that information? do we scan through each node row as we will have 
> row for each node?
>
> thanks
>
> -----Aaron Turner <synfina...@gmail.com> wrote: -----
> To: user@cassandra.apache.org
> From: Aaron Turner <synfina...@gmail.com>
> Date: 08/09/2012 07:38PM
> Subject: Re: Cassandra data model help
>
> On Thu, Aug 9, 2012 at 5:52 AM,  <dinesh.simkh...@gridcore.se> wrote:
>> Hi,
>> I am trying to create a Cassandra schema for cluster monitoring system, 
>> where one cluster can have multiple nodes and I am monitoring multiple 
>> matrices from a node. My raw data schema looks like and taking values in 
>> every 5 min interval
>>
>> matrix_name + daily time stamp as row key, composite column name of node 
>> name and time stamp and matrix value as column value
>>
>> the problem I am facing is a node can go back and forth between the 
>> clusters(system can have more than one clusters) so if i need monthly 
>> statistics plotting of a cluster I have to consider the nodes that are 
>> leaving and joining during this period of time, some node might be part of 
>> the cluster for just 15 days and some could join the cluster last 10 day of 
>> month, so to plot data for a particular cluster for a time interval I need 
>> to know the nodes which were part of that cluster for that period of time, 
>> what could be the best schema for this solution ? I have tried few ideas so 
>> far no luck, any suggestions ?
>
> Store each node stat in it's own row.  Then decide if you want to
> track when a node joins/leaves a cluster so you can build the aggs on
> the fly or just store cluster aggregates in their own row as well.  If
> the latter, depending on your polling methodology, you may want to use
> counters for the cluster aggregates.
>
> Also, if you're doing 5 min intervals with each row = 1 day, then your
> disk space usage is going to grow pretty quickly due to per-column
> overhead.   You didn't say what the values are that you're storing,
> but if they're just 64bit integers or something like that, most of
> your disk space is actually being used for column overhead not your
> data.
>
> I worked around this by creating a 2nd CF, where each row = 1 year
> worth of data and each column = 1 days worth of data.  The values are
> just a vector of the 5min values from the original CF.  Then I just
> have a cron job which reads the previous days data and builds the
> vectors in the new CF and then deletes the original row.  By doing
> this, my disk space requirements (before replication) went from over
> 1.1TB/year to 305GB/year.
>
>
> --
> Aaron Turner
> http://synfin.net/         Twitter: @synfinatic
> http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & 
> Windows
> Those who would give up essential Liberty, to purchase a little temporary
> Safety, deserve neither Liberty nor Safety.
>     -- Benjamin Franklin
> "carpe diem quam minimum credula postero"
>



-- 
Aaron Turner
http://synfin.net/         Twitter: @synfinatic
http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & Windows
Those who would give up essential Liberty, to purchase a little temporary
Safety, deserve neither Liberty nor Safety.
    -- Benjamin Franklin
"carpe diem quam minimum credula postero"

Reply via email to