Re: creating and dropping columnfamilies as a usecase

aaron morton Thu, 21 Oct 2010 10:59:36 -0700

AFAIK it's not really the purpose of the dynamic schema functions. 

You may run into problems such as the caches are per CF and the CF's have a 
high memory overhead (3 * mem table MB) so your memory usage will jump around.

Cloud Kick gather a lot of metrics this may help 
http://wiki.apache.org/cassandra/ArchitectureCommitLog

If you want to use Hadoop for the analysis, and the data really can be thrown 
away, then I would consider using Hadoop by it's self. Take a look at Flume 
from cloudera to stream data into HDFS 
http://www.cloudera.com/blog/2010/09/using-flume-to-collect-apache-2-web-server-logs/

Hope that helps. 
Aaron

On 22 Oct 2010, at 04:12, Utku Can Topçu wrote:

> Hi All,
> 
> In the current project I'm working on. I have use case for hourly analyzing 
> the rows.
> 
> Since the 0.7x branch supports creating and dropping columnfamilies on the 
> fly; 
> My use case proposal will be:
> 
> * Create a CF at the very beginning of every hour
> * At the end of the 1-hour period, analyze the data stored in the CF with 
> Hadoop
> * Drop the CF afterwards.
> 
> Can you foresee any problems in continiously creating and dropping 
> columnfamilies?
> 
> Regards,
> Utku
> 
>

Re: creating and dropping columnfamilies as a usecase

Reply via email to