Dean,
On Tuesday, October 2, 2012 at 18:52, Hiller, Dean wrote: > Because the data for an index is not all together(ie. Need a multi get to get > the data). It is not contiguous. > > The prefix in a partition they keep the data so all data for a prefix from > what I understand is contiguous. > So you're saying that you can access the primary index with a key range, but to access the secondary index, you first need to get all keys and follow up with a multiget, which would use the secondary index to speed the lookup of the matching rows? > > QUESTION: What I don't get in the comment is I assume you are referring to > CQL in which case we would need to specify the partition (in addition to the > index)which means all that data is on one node, correct? Or did I miss > something there. > > Maybe my question was just silly - I wasn't referring to CQL. As for the locality of the data, I was hoping to be able to fire off an MR job to process all matching rows in the CF - I was assuming that that this job would get executed on the same node as the data. But I think the real confusion in my question has to do with the way the ColumnFamilyInputFormat has been implemented, since it would appear that it ingests the entire (non-OPP) CF into Hadoop, such that the predicate needs to be applied in the job rather than up front in the Cassandra query. Cheers, Ben