Re: Hbase 2 scan is very slow

Hamado Dene Wed, 01 Dec 2021 12:29:27 -0800

Hello,
We tried to set the readType to Stream with some tricks but we didn't get any 
improvements.
However, we noticed that on the hbase server side, all requests end up on a 
single regionserver. And this looks pretty loaded on that regionserver.
Is there any way to be able to spread the load across multiple regionservers?



Table Regions
   
   - Base Stats
   - Compactions

| Name(11) | Region Server | ReadRequests
(102,867,803) | WriteRequests
(18,540,395) | StorefileSize
(8.86 GB) | Num.Storefiles
(39) | MemSize
(0 MB) | Locality | Start Key | End Key | Region State |
| mn1_7491_hinvio,,1634722902576.b434d9d4d1cc24d3a84a1bc5986021a5. | 
rzv-db11-hd | 0 | 0 | 0 MB | 2 | 0 MB | 1.0 |  | \x00\x00\x03\x81\x00=\xAD/ | 
OPEN |
| 
mn1_7491_hinvio,\x00\x00\x03\x8D\x00\x16U\xFF,1634985274517.e0785949a0af2490b312eba3e93c8c9c.
 | rzv-db13-hd | 0 | 0 | 0 MB | 2 | 0 MB | 0.0 | \x00\x00\x03\x8D\x00\x16U\xFF 
| \x00\x00\x03\x8D\x00P\x89y | OPEN |
| 
mn1_7491_hinvio,\x00\x00\x03\x8D\x00P\x89y,1634989815752.87116235dfc6bceffea506c2b34af608.
 | rzv-db10-hd | 0 | 0 | 0 MB | 2 | 0 MB | 0.0 | \x00\x00\x03\x8D\x00P\x89y | 
\x00\x00\x03\x9B\x009p\xB1\x00\x03\xAF\x02 | OPEN |
| 
mn1_7491_hinvio,\x00\x00\x03\x9B\x00P\x8E',1634666513222.c45d41e07edcf14e6861c9748a0eae96.
 | rzv-db10-hd | 0 | 0 | 0 MB | 2 | 0 MB | 0.0 | \x00\x00\x03\x9B\x00P\x8E' | 
\x00\x00\x03\xC6\x00X*_\x00\x0C\x99 | OPEN |
| 
mn1_7491_hinvio,\x00\x00\x03\x9B\x009p\xB1\x00\x03\xAF\x02,1634989815752.ee6d00c3d645d69805b8f5e9a51475bb.
 | rzv-db14-hd | 0 | 0 | 0 MB | 2 | 0 MB | 0.0 | 
\x00\x00\x03\x9B\x009p\xB1\x00\x03\xAF\x02 | \x00\x00\x03\x9B\x00P\x8E' | OPEN |
| 
mn1_7491_hinvio,\x00\x00\x03\x81\x00=\xAD/,1634722902576.d093a2b994cec6d2fd0d396f293d9a4a.
 | rzv-db11-hd | 0 | 0 | 0 MB | 2 | 0 MB | 1.0 | \x00\x00\x03\x81\x00=\xAD/ | 
\x00\x00\x03\x81\x00J\xB9\xEA\x00\x07\xDFg | OPEN |
| 
mn1_7491_hinvio,\x00\x00\x03\x81\x00J\xB9\xEA\x00\x07\xDFg,1634826994671.6dff455e4014a70ab45d12c69bfde65b.
 | rzv-db13-hd | 0 | 0 | 0 MB | 1 | 0 MB | 0.0 | 
\x00\x00\x03\x81\x00J\xB9\xEA\x00\x07\xDFg | \x00\x00\x03\x81\x00V\x11U | OPEN |
| 
mn1_7491_hinvio,\x00\x00\x03\x81\x00V\x11U,1634826994671.7f94eef73f9ec0114e7c227b902a889a.
 | rzv-db14-hd | 0 | 0 | 0 MB | 2 | 0 MB | 0.5 | \x00\x00\x03\x81\x00V\x11U | 
\x00\x00\x03\x89\x00k\x0E\xBB\x00\x07\x93p | OPEN |
| 
mn1_7491_hinvio,\x00\x00\x03\x89\x00k\x0E\xBB\x00\x07\x93p,1634985274517.6dd72f28056876483f8f87c325215c01.
 | rzv-db09-hd | 0 | 0 | 0 MB | 2 | 0 MB | 1.0 | 
\x00\x00\x03\x89\x00k\x0E\xBB\x00\x07\x93p | \x00\x00\x03\x8D\x00\x16U\xFF | 
OPEN |
| mn1_7491_hinvio,\x00\x00\x03\xC6\x00X*_\x00\x0C\x99 
,1636891865485.e24c8ad3ba92a91ff90aa4dc4a8f31b8. | rzv-db12-hd | 2,974,880 | 
2,974,879 | 1.58 GB | 13 | 0 MB | 1.0 | \x00\x00\x03\xC6\x00X*_\x00\x0C\x99 | 
\x00\x00\x04'\x00R\xE1-\x00\x0E\xBD\xCC | OPEN |
| 
mn1_7491_hinvio,\x00\x00\x04'\x00R\xE1-\x00\x0E\xBD\xCC,1636891865485.1bdec390b9ff84ee5e355cd4abe87417.
 | rzv-db12-hd | 99,892,923 | 15,565,516 | 7.28 GB | 9 | 0 MB | 1.0 | 
\x00\x00\x04'\x00R\xE1-\x00\x0E\xBD\xCC |  | OPEN |



For example for this table we see that all requests always end on rzv-db12-hd

Thanks,
Hamado Dene
 

    Il domenica 28 novembre 2021, 17:54:27 CET, Hamado Dene 
<[email protected]> ha scritto:  
 
 Yes, we create a FilterList and then add the two filters I mentioned in the 
previous mail combined in AND.so then we do scan.setFilter (filterList).
We will try to make the following implementations:- Set a Limit with 
scan.setLimit (1000), this is because for each scan we only need the first 1000 
lines that satisfy those filters.- We will also try to force the readType to 
STREAM
We will read the issue you mentioned to understand if we do something wrong. We 
realized that the only scans that go very slow are those with Filters. So we 
most likely do something wrong or not performing for Hbase2.
I'll let you know if we can improve our situation.

Thanks,
Hamado Dene    Il sabato 27 novembre 2021, 14:39:30 CET, 张铎(Duo Zhang) 
<[email protected]> ha scritto:  
 
 scan.setFilter(List.of(res1, res2));

What is the 'List' here? You mean FilterList? How do you combine these two
filters, AND or OR?

We have done a bunch of fixes around the semantic of FilterList, please see
this issue

https://issues.apache.org/jira/browse/HBASE-18410

Maybe it affects your usage.

Thanks.



Hamado Dene <[email protected]> 于2021年11月27日周六 下午9:06写道：

>  Thank you in advance for the information you are giving us.As for the
> filters, in this case we set two filters:
>
> org.apache.hadoop.hbase.filter.SingleColumnValueFilter res1 =
> new org.apache.hadoop.hbase.filter.SingleColumnValueFilter(family,
> colQualifier,
> org.apache.hadoop.hbase.filter.CompareFilter.CompareOp.EQUAL,
> intValueToBytes);
> res1.setFilterIfMissing(true);
> res1.setLatestVersionOnly(true);
>
>
>
>
>
> org.apache.hadoop.hbase.filter.SingleColumnValueFilter res2 =
> new org.apache.hadoop.hbase.filter.SingleColumnValueFilter(family,
> colQualifier,
> org.apache.hadoop.hbase.filter.CompareFilter.CompareOp.LESS_OR_EQUAL,
> longValueToBytes);
> res2.setFilterIfMissing(true);
> res2.setLatestVersionOnly(true);
>
>
>
>
>
> scan.setFilter(List.of(res1, res2));
>
> What do you think about these filters? We left them unchanged from
> hbase94, so they might have a negative impact on hbase2?
> As for readType, we can try to force to STREAM.
> thanks,
> Hamado Dene
>
>
>    Il sabato 27 novembre 2021, 13:13:55 CET, 张铎(Duo Zhang) <
> [email protected]> ha scritto:
>
>  The behavior for filters has been changed a lot between 0.94 and 2.x. Mind
> providing more information about what filter you use?
>
> And for large scans, STREAM can perform better than PREAD. The DEFAULT
> option means start from PREAD first and change to STREAM if we read enough
> data.
>
> The responseTooSlow logs are normal if you are doing large scans, as it
> will cost several seconds for a single rpc call. Maybe we should try to
> make logging smarter...
>
> Thanks.
>
> Hamado Dene <[email protected]> 于2021年11月27日周六 下午4:50写道：
>
> >
> >  Hello Hbase community,
> > We have recently switched to hbase 2.2.6 and have noticed that the SCANs
> > are very slow. When we scan a very small amount of data (eg 100k, 200k)
> we
> > do not encounter any problems. But when the amount of data reaches 1
> > million, the scans become very slow.For the scans we basically set
> startRow
> > and endRow and apply different filters. Several threads always require
> > batches of 1000 rows. To get the 1000 rows, while we call next (), we
> use a
> > counter and when we get to 1000 we close the scan with an
> InterupException.
> > This didn't give us any problems in hbase 94 and we had good performance.
> > In Hbase2 we saw that there is a setLimit (int) option to specify to the
> > regionserver the number of rows it wants. Also I see that it is possible
> to
> > set a readType which can be PREAD or STREAM.- Do you think that setting
> > this option can lead to better scan performance?- What is the difference
> > between PREAD and STREAM?- In which case does it make sense to use PREAD
> /
> > STREAM?
> > We have already done some hbase server-side tuning, but we still can't
> get
> > good scan performance.When we start working with large amounts of data,
> we
> > start to see a lot of server-side "responseTooSlow".like:2021-10-28 16:
> 45:
> > 00,854 WARN [RpcServer.default.FPBQ.Fifo.handler = 46, queue = 1, port =
> > 16020] ipc.RpcServer: (responseTooSlow): {"call": "Scan (org.
> > apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos $ ScanRequest)
> > "," starttimems ":" 1635432272849 "," responsesize ":" 221799 "," method
> > ":" Scan "," param ":" scanner_id: 3011016724423115474 number_of_rows:
> 1000
> > close_scanner: false next_call_seq: 0 client_handles_partials: true
> > client_handles_heartbeats: tr \ u003cTRUNCATED \ u003e ","
> processingtimems
> > ": 28005," client ":" 10.200.86.173:60806","queuetimclass "":0
> > HRegionServer "," scandetails ":" table: mn1_7491_hinvio region:
> > mn1_7491_hinvio .....}
> >
> > Thanks,
> > Hamado Dene
>

Re: Hbase 2 scan is very slow

Reply via email to