Re: Cassandra and count

aaron morton Sun, 30 Jan 2011 00:29:50 -0800

There are two functions on the 0.7 API http://wiki.apache.org/cassandra/API to 
count the columns in a row, get_count() and multiget_count() (not listed on the 
wiki yet). Both of these will take a SlicePredicate which may have an empty 
start and end.

The only way to count rows is to use  get_range_slice(), which will return the 
columns request. To reduce bandwidth of the query request it to return a single 
column.

However the return from these functions is not guaranteed to be correct. 
Cassandra does not lock it's internal structures, so while it's busy processing 
your request other connections may be adding columns and rows. So that by the 
time it returns back to you the count if already wrong. You can apply the same 
reasoning to why there are no aggregate functions. 

Do you need count the rows as a once off or is it part of your application 
design ? 

Hope that helps
Aaron

On 29 Jan 2011, at 05:02, Victor Kabdebon wrote:

> Buddasystem is right.
> A count returns columns to the client which count it. My advice : do not 
> count big columns / supercolumns. People in the dev team are trying to 
> develop distributed counters but I don't know the state of this research.
> 
> Best regards,
> Victor Kabdebon
> http://www.voxnucleus.fr
> 
> 2011/1/28 buddhasystem <potek...@bnl.gov>
> 
> As far as I know, there are no aggregate operations built into Cassandra,
> which means you'll have to retrieve all of the data to count it in the
> client. I had a thread on this topic 2 weeks ago. It's pretty bad.
> 
> --
> View this message in context: 
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Cassandra-and-count-tp5969159p5970315.html
> Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
> Nabble.com.
>

Re: Cassandra and count

Reply via email to