Can I specify the range inside of fuzzy rule in FuzzyRowFilter?

Alex Baranau Fri, 17 Aug 2012 13:42:38 -0700

There was a question [1] in
https://issues.apache.org/jira/browse/HBASE-6509JIRA comment, it makes
more sense to answer it here.


With the current FuzzyRowFilter I believe the only way to approach the
problem is to add 150 fuzzy rules to the filter: ??????00200, ??????00201,
..., ??????00350.

As for performance of this approach I can say the following:
* there are two "checks" happening for each processed row key (i.e. those
row keys we don't skip)
* first one performs simple check if the given row key satisfies the fuzzy
rule and also determines if there's next row key to advance to (if this one
doesn't satisfy). The check takes up at max O(n), where n is the length of
fuzzy rule. I.e. this is done in one simple loop, which can be broken
before all bytes are checked. For m rules this will be O(m*n).
* second piece calculates the next row key to provide it as a hint for
fast-forwarding. We again check all rules and finding the smallest hint.
Operation is also done in one loop, i.e. O(m*n) here as well.

With 150 fuzzy rules of length 11, the applying filter is equivalent to the
loop with simple checks thru 150*11*2 ~ 3000 elements. This might look a
lot, but can work quite fast. So I'd just try it.

As for extension which will be more efficient, it makes sense to consider
implementing it. Let me think more about it and get back with the JIRA
Issue to you :). But I'd suggest you trying existing FuzzyRowFilter first.
The output (performance) would give us some food for thinking, or may be
even turns out to be acceptable for you (hopefully).

> Can i run this kind of filter on HBase0.92 without doing any significant
update to the cluster

Until the next release, you'll have to use the FuzzyRowFilter as any other
custom filter. Just grab the patch from HBASE-6509 and copy the filter. No
need to patch & rebuild HBase.

Alex Baranau
------
Sematext :: http://sematext.com/ :: Hadoop - HBase - ElasticSearch - Solr

[1]

Anil Gupta added a comment - 18/Aug/12 04:37
Hi Alex,
I have a question related to this filter. I have a similar filtering
requirement which will be an extension to FuzzyFilterRow.
Suppose, i have the following structure of rowkeys: userid_actionid, where
userid is of 6 digit and then actionid is 5 digit. I would like to get all
the rows with actionid between 00200 to 00350. With current FuzzyRowFilter
i can search for all the rows a particular actionid. Instead of searching
for a particular actionid i would like to search for a range of actionid.
Does this use case sounds like an extension to current FuzzyRowFilter? Can
i run this kind of filter on HBase0.92 without doing any significant update
to the cluster. If i develop this kind of filter then what is needed to run
it on all the RS's?
Thanks,
Anil

Can I specify the range inside of fuzzy rule in FuzzyRowFilter?

Reply via email to