Nick, if he didn't specify startKey, endKey in the Scan Object, and delegate it to a Filter, this means he will send this scan to *all* regions in the system, instead of just one or two, no?
On Tue, Feb 26, 2013 at 10:12 PM, Nick Dimiduk <[email protected]> wrote: > Hi Paul, > > You want to run multiple scans so that you can filter the previous scan > results? Am I correct in my understanding of your objective? > > First, I suggest you use the PrefixFilter [0] instead of constructing the > rowkey prefix manually. This looks something like: > > byte[] md5Key = Utils.md5( "2013-01-07" ); > Scan scan = new Scan(md5Key); > scan.setFilter(new PrefixFilter(md5Key)); > > Yes, that's a bit redundant, but setting the startkey explicitly will save > you some unnecessary processing. > > This map reduce job works fine but this is just one scan job for this map > > reduce task. What do I have to do to pass multiple scans? > > > Do you mean processing on multiple dates? In that case, what you really > want is a full (unbounded) table scan. Since date is the first part of your > compound rowkey, there's no prefix and no need for a filter, just use new > Scan(). > > In general, you can use multiple filters in a given Scan (or Get). See the > FilterList [1] for details. > > Does this help? > Nick > > [0]: > > http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/PrefixFilter.html > [1]: > > http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/FilterList.html > > On Tue, Feb 26, 2013 at 5:41 AM, Paul van Hoven < > [email protected]> wrote: > > > My rowkeys look something like this: > > > > md5( date ) + md5( ip address ) > > > > So an example would be > > md5( "2013-02-08") + md5( "192.168.187.2") > > > > For one particular date I got several rows. Now I'd like to query > > different dates, for example "2013-01-01" and "2013-02-01" and some > > other. Additionally I'd like to perform this or these scans in a map > > reduce job. > > > > Currently my map reduce job looks like this: > > > > Configuration config = HBaseConfiguration.create(); > > Job job = new Job(config,"ToyJob"); > > job.setJarByClass( PlayWithMapReduce.class ); > > > > byte[] md5Key = Utils.md5( "2013-01-07" ); > > int md5Length = 16; > > int longLength = 8; > > > > byte[] startRow = Bytes.padTail( md5Key, longLength ); //append "0 0 0 > > 0 0 0 0 0" > > byte[] endRow = Bytes.padTail( md5Key, longLength ); > > endRow[md5Length-1]++; //last byte gets counted up > > > > Scan scan = new Scan( startRow, endRow ); > > scan.setCaching(500); > > scan.setCacheBlocks(false); > > > > Filter f = new SingleColumnValueFilter( Bytes.toBytes("CF"), > > Bytes.toBytes("creativeId"), CompareOp.EQUAL, Bytes.toBytes("100") ); > > scan.setFilter(f); > > > > String tableName = "ToyDataTable"; > > TableMapReduceUtil.initTableMapperJob( tableName, scan, Mapper.class, > > null, null, job); > > > > This map reduce job works fine but this is just one scan job for this > > map reduce task. What do I have to do to pass multiple scans? Or do > > you have any other suggestions on how to achieve that goal? The > > constraint would be that it must be possible to combine it with map > > reduce. > > >
