For background, you may find the wide row setting useful http://www.datastax.com/docs/1.1/cluster_architecture/hadoop_integration
AFAIK all the input row readers for Hadoop do range scans. And I think the support for setting the start and end token is used so that jobs only select data which is local to the node. It's not really possible to select individual rows by token. If you had a secondary index on the row you could use the setInputRange overload that takes an index expression. Or it may be easier to use hive. Hope that helps. ----------------- Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 1/12/2012, at 3:04 PM, Jamie Rothfeder <[email protected]> wrote: > Hey All, > > I have a bunch of time-series data stored in a cluster using a > ByteOrderedPartitioner. My keys are time buckets representing events that > occurred in an hour. I've been trying to write a mapreduce job that considers > only events with in a certain time range by specifying an input range, but > this doesn't seem to be working. > > I expect the following code to scan data for a single key (1353456000), but > it is scanning all keys. > > int key = 1353456000; > IPartitioner part = ConfigHelper.getInputPartitioner(job.getConfiguration()); > Token token = part.getToken(ByteBufferUtil.bytes(key)); > ConfigHelper.setInputRange(job.getConfiguration(), > part.getTokenFactory().toString(token), > part.getTokenFactory().toString(token)); > > Any idea what I'm doing wrong? > > Thanks, > Jamie
