Take a look at this blog: http://blog.sematext.com/2012/08/09/consider-using-fuzzyrowfilter-when-in-need-for-secondary-indexes-in-hbase/
From your earlier description, the components of your rowkey have fixed length. Thus you can consider using fuzzy row filter. Cheers On Jan 14, 2014, at 11:08 PM, Ramon Wang <[email protected]> wrote: > Hi Ted > > Thanks for the quick reply. > > With this FuzzyRowFilter, do i still need to pass in startRow and stopRow > like below when constructing a Scan object? > >> Scan(byte [] startRow, byte [] stopRow) > > > Will the FuzzyRowFilter provide us performance like a directly get by row > when we pass something like "20140101_EN_?" > > Cheers > Ramon > > > On Wed, Jan 15, 2014 at 2:22 PM, Ted Yu <[email protected]> wrote: > >> Please take a look at >> http://hbase.apache.org/0.94/apidocs/org/apache/hadoop/hbase/filter/FuzzyRowFilter.html >> >> Cheers >> >> On Jan 14, 2014, at 10:16 PM, Ramon Wang <[email protected]> wrote: >> >>> Hi Folks >>> >>> We have a table with fixed pattern row key design, the format for the row >>> key is YEAR_COUNTRY_randomNumber, for example: >>> >>> 20140101_EN_1 >>> 20140101_EN_2 >>> 20140101_EN_3 >>> 20140101_US_1 >>> 20140101_US_2 >>> 20140101_US_3 >>> ... >>> >>> Is there a way i can quickly get the data for "20140101_EN_*" by using >> Scan >>> without scan the full table? I think we are probably going to use >>> the PrefixFilter filter with the Scan object, but the problem is that we >>> don't know the "startRow" for each scan, any ideas? >>> >>> Thanks >>> Ramon >>
