Nice!  Thanks Ed.

On Nov 10, 2011, at 11:20 PM, Edward Capriolo wrote:

> Hey all,
> 
> I know there are several tickets in the pipe that should make it possible do 
> use secondary indexes to run map reduce jobs that do not have to ingest the 
> entire dataset such as:
> 
> https://issues.apache.org/jira/browse/CASSANDRA-1600
> 
> I had ended up creating a sharded secondary index in user space (I just call 
> it ordered buckets), described here:
> 
> http://www.slideshare.net/edwardcapriolo/casbase-presentation/27
> 
> Looking at the ordered buckets implementation I realized it is a perfect 
> candidate for "efficient map reduce" since it is easy to split.
> 
> A unit test of that implementation is here:
> 
> https://github.com/edwardcapriolo/casbase/blob/master/src/test/java/com/jointhegrid/casbase/hadoop/OrderedBucketInputFormatTest.java
> 
> With this you can current do efficient map reduce on cassandra data, while 
> waiting for other integrated solutions to come along.
> 

Reply via email to