Thanks Ted. I had other filters in there but wanted to make it simple and just have one filter for now and then add them one by one until I get everything working.
So I can't have just one filter in a filter list? Kind of makes it hard to debug if I have multiple filters that might be bad (or just one bad and 9 good but can't figure out which is the bad one). On Thu, Jul 13, 2017 at 5:34 PM, Ted Yu <[email protected]> wrote: > rowFilter is added to filter list which doesn't contain other filters. > > Maybe the snippet doesn't contain all the code in your class ? > > On Thu, Jul 13, 2017 at 5:26 PM, S L <[email protected]> wrote: > > > I don't understand why my regex doesn't work when scanning hbase. > > Everything looks good to me but for some reason, it's returning all keys > > when it should just return the ones I'm requesting > > > > Scan scan = new Scan(); > > scan.addColumn(Bytes.toBytes("raw_data"), Bytes.toBytes(fileType)); > > scan.setCaching(limit); > > scan.setCacheBlocks(false); > > scan.setTimeRange(start, end); > > FilterList filters = new FilterList(); > > Filter rowFilter = new RowFilter(CompareFilter.CompareOp.EQUAL, new > > RegexStringComparator("100_.*_\\d{10}")); > > filters.addFilter(rowFilter); > > scan.setFilter(filters); > > > > TableMapReduceUtil.initTableMapperJob(tableName, scan, MTTRMapper.class, > > Text.class, IntWritable.class, job); > > > > The rowkey is stored as a string in hbase. The rowkey is in the format > of > > hash_servername_timestamp, e.g. > > > > 0_myserver.mydomain.com_1234567890 > > > > The hash can be any number from 0-199. In the above filter, I just want > to > > get all elements with hash = 100 but for some reason, the scan job > appears > > to return other rowkeys in addition to the ones with hash = 100. > > > > I've tried this with jar versions 1.0.1 and 1.2.0-cdh5.7.2. What am I > > doing wrong that's making the regex not work? > > >
