There was a question [1] in https://issues.apache.org/jira/browse/HBASE-6509JIRA comment, it makes more sense to answer it here.
With the current FuzzyRowFilter I believe the only way to approach the problem is to add 150 fuzzy rules to the filter: ??????00200, ??????00201, ..., ??????00350. As for performance of this approach I can say the following: * there are two "checks" happening for each processed row key (i.e. those row keys we don't skip) * first one performs simple check if the given row key satisfies the fuzzy rule and also determines if there's next row key to advance to (if this one doesn't satisfy). The check takes up at max O(n), where n is the length of fuzzy rule. I.e. this is done in one simple loop, which can be broken before all bytes are checked. For m rules this will be O(m*n). * second piece calculates the next row key to provide it as a hint for fast-forwarding. We again check all rules and finding the smallest hint. Operation is also done in one loop, i.e. O(m*n) here as well. With 150 fuzzy rules of length 11, the applying filter is equivalent to the loop with simple checks thru 150*11*2 ~ 3000 elements. This might look a lot, but can work quite fast. So I'd just try it. As for extension which will be more efficient, it makes sense to consider implementing it. Let me think more about it and get back with the JIRA Issue to you :). But I'd suggest you trying existing FuzzyRowFilter first. The output (performance) would give us some food for thinking, or may be even turns out to be acceptable for you (hopefully). > Can i run this kind of filter on HBase0.92 without doing any significant update to the cluster Until the next release, you'll have to use the FuzzyRowFilter as any other custom filter. Just grab the patch from HBASE-6509 and copy the filter. No need to patch & rebuild HBase. Alex Baranau ------ Sematext :: http://sematext.com/ :: Hadoop - HBase - ElasticSearch - Solr [1] Anil Gupta added a comment - 18/Aug/12 04:37 Hi Alex, I have a question related to this filter. I have a similar filtering requirement which will be an extension to FuzzyFilterRow. Suppose, i have the following structure of rowkeys: userid_actionid, where userid is of 6 digit and then actionid is 5 digit. I would like to get all the rows with actionid between 00200 to 00350. With current FuzzyRowFilter i can search for all the rows a particular actionid. Instead of searching for a particular actionid i would like to search for a range of actionid. Does this use case sounds like an extension to current FuzzyRowFilter? Can i run this kind of filter on HBase0.92 without doing any significant update to the cluster. If i develop this kind of filter then what is needed to run it on all the RS's? Thanks, Anil
