Because the data for an index is not all together(ie. Need a multi get to get 
the data).  It is not contiguous.

The prefix in a partition they keep the data so all data for a prefix from what 
I understand is contiguous.

QUESTION: What I don't get in the comment is I assume you are referring to CQL 
in which case we would need to specify the partition (in addition to the 
index)which means all that data is on one node, correct?  Or did I miss 
something there.

Thanks,
Dean

From: Ben Hood <0x6e6...@gmail.com<mailto:0x6e6...@gmail.com>>
Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
<user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Date: Tuesday, October 2, 2012 11:18 AM
To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
<user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Subject: Re: 1000's of column families

Jeremy,

On Tuesday, October 2, 2012 at 17:06, Jeremy Hanna wrote:

Another option that may or may not work for you is the support in Cassandra 
1.1+ to use a secondary index as an input to your mapreduce job. What you might 
do is add a field to the column family that represents which virtual column 
family that it is part of. Then when doing mapreduce jobs, you could use that 
field as the secondary index limiter. Secondary index mapreduce is not as 
efficient since you first get all of the keys and then do multigets to get the 
data that you need for the mapreduce job. However, it's another option for not 
scanning the whole column family.

Interesting. This is probably a stupid question but why shouldn't you be able 
to use the secondary index to go straight to the slices that belong to the 
attribute you are searching by? Is this something to do with the way Cassandra 
is exposed as an InputFormat for Hadoop or is this a general property for 
searching by secondary index?

Ben

Reply via email to