[jira] [Comment Edited] (HBASE-20565) ColumnRangeFilter combined with ColumnPaginationFilter can produce incorrect result since 1.4

2018-07-17 Thread Zheng Hu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16546584#comment-16546584
 ] 

Zheng Hu edited comment on HBASE-20565 at 7/17/18 1:19 PM:
---

Upload the patch.v1, and  Let me explain the core idea: 

Assume that filterList = filter-A   AND   filter-B  AND filter-C  AND ,
if  a cell has been filtered out by filter-A,  then no need to 
pass the cell to filter-B and filter-C,  only the included cell set of filter-A 
should be passed to filter-B, and only the included cell set of filter-A & 
filter-B should be passed to filter-C   

The max rule can still working,  but only the include* return code should be 
merged into a max return code. 

The problem is the order of filters may result in diff cells...so we need to 
tell the user explicitly to place the count-related filters at the last 
position.  In SQL syntax,  we accept the sql :
{code}
 select * from table where xxx and yyy  limit 1, 100,
{code}
the limit is at the end of the statement,

SQL such as: 
{code}
select * from table where xxx limit 1, 1000 and yyy
{code}
will not be accepted. 


was (Author: openinx):
Upload the patch.v1, and  pasted the discuss with [~anoop.hbase] ... 

> What if the order of filters be opposite way in FL?
A good question,   I think we need to tell the user explicitly to place the 
count-related filters at the last position.  In SQL syntax,  we accept the sql 
: select * from table where xxx and xxx  limit 1, 100, the limit is at the end 
of the statement,  sql such as: select * from table where xxx limit 1, 1000 and 
xx will not be accepted.  
I think it's meaningful to require the count-related filters put at the end of 
sub-filters. 


On Fri, May 25, 2018 at 6:25 PM, Anoop John  wrote:
> if  a cell has been filtered out by filter-A,  then no need to
pass the cell to filter-B and filter-C,  only the included cell set of
filter-A should be passed to filter-B, and only the included cell set
of filter-A & filter-B should be passed to filter-C ...

U mean u propose such a change now?  Then the order of filters matters
right?  Say the count based filter is coming second and the other
(which can filter out some cells) come as 1st, it will work. What if
the order of filters be opposite way in FL?

-Anoop-

On Fri, May 25, 2018 at 12:29 PM, OpenInx  wrote:
> I have to admit that my previous solution was one-sided...
> Not only the ColumnPaginationFilter has the problem, other counter-related
> filters also has the problem too.
>
>> We have 2 filters in a FL. We pass cell 1 and 2. First filter select cell1
>> but been filtered out by F2.  Now we need to tell both filters that we
>> have excludes this cell.  This will be useful for filters which work on
>> counting  basis.  It can reduce the counter which it would have advanced.
>> Pls see the possibility.
>
> Assume that FilterList =  filter-A  AND ColumnCountGetFilter ,  if cell x
> has been filtered out by filter-A,  then what the expected return code do
> the ColumnCountGetFilter#filterKeyValue shoud return ?
> In theory, the count in ColumnCountGetFilter  should not increment when
> checking the cell x .  So what is the purpose of passing the cell  x to
> ColumnCountGetFilter#filterKeyValue ?
> To get the return code from ColumnCountGetFilter for max the forward step ?
>
> Now, I'm thinking that the implementation in branch-1.2  is more reasonable,
> Assume that filterList = filter-A   AND   filter-B  AND filter-C  AND ,
> if  a cell has been filtered out by filter-A,  then no need to
> pass the cell to filter-B and filter-C,  only the included cell set of
> filter-A should be passed to filter-B, and only the included cell set of
> filter-A & filter-B should be passed to filter-C 
>
> The max rule can still working,  but only the include* return code should be
> merged into a max return code.
>
> I think the semantic is more reasonable.
>
>
> On Thu, May 24, 2018 at 4:31 PM, Anoop John  wrote:
>>
>> The offset is the cell offset in  a row na.  This says we already fetched
>> till there. So ya of there is another filter also along with this pagination
>> filter, it must be hard for the pagination filter to decide the column
>> offset for the next request.  So ya ideally the column offset might work
>> there.
>> But the issue is we can not really generalize this. It depends on the way
>> the col offset and column value offset is been implemented in pagination
>> filter.
>>
>> I kind of thinking that we need a generic framework change now. If we pass
>> all cells to all filters ( which is correct also) then there should be a way
>> later with which we say all filters that we decided later that this cell is
>> not included in result.
>>
>> We have 2 filters in a FL. We pass cell 1 and 2. First filter select cell1
>> but been filtered out by F2.  Now we need to tell both filters that we have
>> excludes this 

[jira] [Comment Edited] (HBASE-20565) ColumnRangeFilter combined with ColumnPaginationFilter can produce incorrect result since 1.4

2018-05-17 Thread Zheng Hu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16473738#comment-16473738
 ] 

Zheng Hu edited comment on HBASE-20565 at 5/18/18 3:10 AM:
---

It's complex now...if a FilterListWithAND has two sub filters A & B, and  B has 
a row-level global state (such as column offset...),  after HBASE-18410, we 
optimze the forward step to be maximum, so we consider the return code from 
sub-filters at global level, which means that a cell may not be included the 
sub-filter A, but we still need to pass the cell to sub-filter B for 
calculating B' return code (for global return code purpose),  finally B's 
row-level global state messed up ... 


was (Author: openinx):
It's complex now...if a FilterListWithAND has two sub filters A & B, and  B has 
a row-level global state (such as column limit...),  after HBASE-18410, we 
optimze the forward step to be maximum, so we consider the return code from 
sub-filters at global level, which means that a cell may not be included the 
sub-filter A, but we still need to pass the cell to sub-filter B for 
calculating B' return code (for global return code purpose),  finally B's 
row-level global state messed up ... 

> ColumnRangeFilter combined with ColumnPaginationFilter can produce incorrect 
> result since 1.4
> -
>
> Key: HBASE-20565
> URL: https://issues.apache.org/jira/browse/HBASE-20565
> Project: HBase
>  Issue Type: Bug
>  Components: Filters
>Affects Versions: 1.4.4
>Reporter: Jerry He
>Assignee: Zheng Hu
>Priority: Major
> Attachments: debug.diff, debug.log, test-branch-1.4.patch
>
>
> When ColumnPaginationFilter is combined with ColumnRangeFilter, we may see 
> incorrect result.
> Here is a simple example.
> One row with 10 columns c0, c1, c2, .., c9.  I have a ColumnRangeFilter for 
> range c2 to c9.  Then I have a ColumnPaginationFilter with limit 5 and offset 
> 0.  FileterList is FilterList(Operator.MUST_PASS_ALL, ColumnRangeFilter, 
> ColumnPaginationFilter).
> We expect 5 columns being returned.  But in HBase 1.4 and after, 4 columns 
> are returned.
> In 1.2.x, the correct 5 columns are returned.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)