Hi Paul, You want to run multiple scans so that you can filter the previous scan results? Am I correct in my understanding of your objective?
First, I suggest you use the PrefixFilter [0] instead of constructing the rowkey prefix manually. This looks something like: byte[] md5Key = Utils.md5( "2013-01-07" ); Scan scan = new Scan(md5Key); scan.setFilter(new PrefixFilter(md5Key)); Yes, that's a bit redundant, but setting the startkey explicitly will save you some unnecessary processing. This map reduce job works fine but this is just one scan job for this map > reduce task. What do I have to do to pass multiple scans? Do you mean processing on multiple dates? In that case, what you really want is a full (unbounded) table scan. Since date is the first part of your compound rowkey, there's no prefix and no need for a filter, just use new Scan(). In general, you can use multiple filters in a given Scan (or Get). See the FilterList [1] for details. Does this help? Nick [0]: http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/PrefixFilter.html [1]: http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/FilterList.html On Tue, Feb 26, 2013 at 5:41 AM, Paul van Hoven < [email protected]> wrote: > My rowkeys look something like this: > > md5( date ) + md5( ip address ) > > So an example would be > md5( "2013-02-08") + md5( "192.168.187.2") > > For one particular date I got several rows. Now I'd like to query > different dates, for example "2013-01-01" and "2013-02-01" and some > other. Additionally I'd like to perform this or these scans in a map > reduce job. > > Currently my map reduce job looks like this: > > Configuration config = HBaseConfiguration.create(); > Job job = new Job(config,"ToyJob"); > job.setJarByClass( PlayWithMapReduce.class ); > > byte[] md5Key = Utils.md5( "2013-01-07" ); > int md5Length = 16; > int longLength = 8; > > byte[] startRow = Bytes.padTail( md5Key, longLength ); //append "0 0 0 > 0 0 0 0 0" > byte[] endRow = Bytes.padTail( md5Key, longLength ); > endRow[md5Length-1]++; //last byte gets counted up > > Scan scan = new Scan( startRow, endRow ); > scan.setCaching(500); > scan.setCacheBlocks(false); > > Filter f = new SingleColumnValueFilter( Bytes.toBytes("CF"), > Bytes.toBytes("creativeId"), CompareOp.EQUAL, Bytes.toBytes("100") ); > scan.setFilter(f); > > String tableName = "ToyDataTable"; > TableMapReduceUtil.initTableMapperJob( tableName, scan, Mapper.class, > null, null, job); > > This map reduce job works fine but this is just one scan job for this > map reduce task. What do I have to do to pass multiple scans? Or do > you have any other suggestions on how to achieve that goal? The > constraint would be that it must be possible to combine it with map > reduce. >
