[ 
https://issues.apache.org/jira/browse/HBASE-14397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Antonov resolved HBASE-14397.
-------------------------------------
    Resolution: Fixed

> PrefixFilter doesn't filter all remaining rows if the prefix is longer than 
> rowkey being compared
> -------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-14397
>                 URL: https://issues.apache.org/jira/browse/HBASE-14397
>             Project: HBase
>          Issue Type: Improvement
>          Components: Filters
>    Affects Versions: 2.0.0
>            Reporter: Jianwei Cui
>            Assignee: Jianwei Cui
>            Priority: Minor
>             Fix For: 2.0.0, 1.3.0
>
>         Attachments: HBASE-14397-trunk-v1.patch
>
>
> The PrefixFilter will filter rowkey as:
> {code}
>   public boolean filterRowKey(Cell firstRowCell) {
>     ...
>     int length = firstRowCell.getRowLength();
>     if (length < prefix.length) return true; // ===> return directly if the 
> prefix is longer
>     ....
>     if ((!isReversed() && cmp > 0) || (isReversed() && cmp < 0)) {
>       passedPrefix = true;
>     }
>     filterRow = (cmp != 0);
>     return filterRow;
>   }
> {code}
> If the prefix is longer than the current rowkey, PrefixFilter#filterRowKey 
> will filter the rowkey directly without comparing, so that won't set 
> 'passedPrefix' flag even the current row is larger than the prefix.
> For example, if there are three rows 'a', 'b' and 'c' in the table, and we 
> issue a scan request as:
> {code}
> hbase(main):001:0> scan 'test_table', {STARTROW => 'a', FILTER => 
> "(PrefixFilter ('aa'))"}
> {code}
> The region server will check the three rows before returning.  In our 
> production, the user issue a scan with a PrefixFilter. The prefix is longer 
> than the rowkeys of following millions of rows, so the region server will 
> continue to check rows until hit a rowkey longer than the prefix. This make 
> the client easily timeout. To fix this case, it seems we need to compare the 
> prefix with the rowkey every serveral rows even when the prefix is longer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to