Nice! Thanks Ed. On Nov 10, 2011, at 11:20 PM, Edward Capriolo wrote:
> Hey all, > > I know there are several tickets in the pipe that should make it possible do > use secondary indexes to run map reduce jobs that do not have to ingest the > entire dataset such as: > > https://issues.apache.org/jira/browse/CASSANDRA-1600 > > I had ended up creating a sharded secondary index in user space (I just call > it ordered buckets), described here: > > http://www.slideshare.net/edwardcapriolo/casbase-presentation/27 > > Looking at the ordered buckets implementation I realized it is a perfect > candidate for "efficient map reduce" since it is easy to split. > > A unit test of that implementation is here: > > https://github.com/edwardcapriolo/casbase/blob/master/src/test/java/com/jointhegrid/casbase/hadoop/OrderedBucketInputFormatTest.java > > With this you can current do efficient map reduce on cassandra data, while > waiting for other integrated solutions to come along. >
