Re: 1000's of column families

Ben Hood Tue, 02 Oct 2012 12:02:21 -0700

Dean,

On Tuesday, October 2, 2012 at 18:52, Hiller, Dean wrote:

> Because the data for an index is not all together(ie. Need a multi get to get 
> the data). It is not contiguous.
> 
> The prefix in a partition they keep the data so all data for a prefix from 
> what I understand is contiguous.
> 

So you're saying that you can access the primary index with a key range, but to 
access the secondary index, you first need to get all keys and follow up with a 
multiget, which would use the secondary index to speed the lookup of the 
matching rows?

> 
> QUESTION: What I don't get in the comment is I assume you are referring to 
> CQL in which case we would need to specify the partition (in addition to the 
> index)which means all that data is on one node, correct? Or did I miss 
> something there.
> 
> 

Maybe my question was just silly - I wasn't referring to CQL.

As for the locality of the data, I was hoping to be able to fire off an MR job 
to process all matching rows in the CF - I was assuming that that this job 
would get executed on the same node as the data.

But I think the real confusion in my question has to do with the way the 
ColumnFamilyInputFormat has been implemented, since it would appear that it 
ingests the entire (non-OPP) CF into Hadoop, such that the predicate needs to 
be applied in the job rather than up front in the Cassandra query.

Cheers,

Ben

Re: 1000's of column families

Reply via email to