Added test code that can reproduce.
You can check the reproduction code here[1].

It is not a perfect test code.
I just made it so that you can check the metric information.

When you run the code, you can see the following output.

---
countOfRowsScanned: 1, countOfRowsFiltered: 0
countOfRowsScanned: 1, countOfRowsFiltered: 1
---

Thanks.

[1]: 
https://github.com/mwkang/hbase/blob/b07bab436ebea7d4e9e8a6df52ef99dd7de6b761/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestRowsFilteredMetrics.java

________________________________________
보낸 사람: Kang Minwoo <minwoo.k...@outlook.com>
보낸 날짜: 2024년 11월 13일 수요일 13:45
받는 사람: user@hbase.apache.org
제목: Re: the number of rowsFiltered is increased, even not filtered

Thank you for your reply.

I will check if I can make UT and create it.
But before that, in the following section,

if (isEmptyRow || ret == FilterWrapper.FilterRowRetCode.EXCLUDE || filterRow()) 
{
  incrementCountOfRowsFilteredMetric(scannerContext);
}

I would like to inquire if calling incrementCountOfRowsFilteredMetric only when 
ret == EXCLUDE or filterRow() is true can solve the issue.

if (isEmptyRow || ret == FilterWrapper.FilterRowRetCode.EXCLUDE || filterRow()) 
{
  if (ret == FilterWrapper.FilterRowRetCode.EXCLUDE || filterRow()) {
    incrementCountOfRowsFilteredMetric(scannerContext);
  }
}

Thanks.

________________________________________
보낸 사람: 张铎(Duo Zhang) <palomino...@gmail.com>
보낸 날짜: 2024년 11월 11일 월요일 22:28
받는 사람: user@hbase.apache.org
제목: Re: the number of rowsFiltered is increased, even not filtered

Could you provide a UT to represent this behavior?

Looking at the code, I agree with you that this seems like a bug. But
the scan logic is a bit complicated, I'm not sure whether we have
other ways to make the result correct...

Thanks.

Kang Minwoo <minwoo.k...@outlook.com> 于2024年11月11日周一 20:16写道:
>
> Hello Community,
>
> According to the HBASE-5980[1], rowsFiltered should only count the number of 
> rows filtered by the filter.
> However, if the current row is DeleteFamily, the number of rowsFiltered 
> increases because there are no results in populateResult.
>
> ------------
>
> // HRegion.RegionScannerImpl#nextInternal
> private boolean nextInternal(List<Cell> results, ScannerContext 
> scannerContext) throws IOException {
>   // current=..../DeleteFamily/vlen=?/seqid=?
>   Cell current = this.storeHeap.peek();
>
>   // hasFilterRow = false
>   boolean hasFilterRow = this.filter != null && this.filter.hasFilterRow();
>
>   if (joinedContinuationRow == null) {
>     // results is empty.
>     populateResult(results, this.storeHeap, scannerContext, current);
>
>     Cell nextKv = this.storeHeap.peek();
>     shouldStop = shouldStop(nextKv);
>     // isEmptyRow = true
>     final boolean isEmptyRow = results.isEmpty();
>
>     if (isEmptyRow || ret == FilterWrapper.FilterRowRetCode.EXCLUDE || 
> filterRow()) {
>       // rowsFiltered++ (Because isEmptyRow=true)
>       incrementCountOfRowsFilteredMetric(scannerContext);
>     }
>   }
> }
>
> ------------
>
> I tried to use rowsFiltered to check the number of rows filtered by the 
> filter, but it seems that there is no way to check it currently.
> I wonder if this behavior is intentional.
>
> Thanks.
>
> [1]: https://issues.apache.org/jira/browse/HBASE-5980

Reply via email to