Thank you in advance for the information you are giving us.As for the filters, in this case we set two filters:
org.apache.hadoop.hbase.filter.SingleColumnValueFilter res1 = new org.apache.hadoop.hbase.filter.SingleColumnValueFilter(family, colQualifier, org.apache.hadoop.hbase.filter.CompareFilter.CompareOp.EQUAL, intValueToBytes); res1.setFilterIfMissing(true); res1.setLatestVersionOnly(true); org.apache.hadoop.hbase.filter.SingleColumnValueFilter res2 = new org.apache.hadoop.hbase.filter.SingleColumnValueFilter(family, colQualifier, org.apache.hadoop.hbase.filter.CompareFilter.CompareOp.LESS_OR_EQUAL, longValueToBytes); res2.setFilterIfMissing(true); res2.setLatestVersionOnly(true); scan.setFilter(List.of(res1, res2)); What do you think about these filters? We left them unchanged from hbase94, so they might have a negative impact on hbase2? As for readType, we can try to force to STREAM. thanks, Hamado Dene Il sabato 27 novembre 2021, 13:13:55 CET, 张铎(Duo Zhang) <palomino...@gmail.com> ha scritto: The behavior for filters has been changed a lot between 0.94 and 2.x. Mind providing more information about what filter you use? And for large scans, STREAM can perform better than PREAD. The DEFAULT option means start from PREAD first and change to STREAM if we read enough data. The responseTooSlow logs are normal if you are doing large scans, as it will cost several seconds for a single rpc call. Maybe we should try to make logging smarter... Thanks. Hamado Dene <hamadod...@yahoo.com.invalid> 于2021年11月27日周六 下午4:50写道: > > Hello Hbase community, > We have recently switched to hbase 2.2.6 and have noticed that the SCANs > are very slow. When we scan a very small amount of data (eg 100k, 200k) we > do not encounter any problems. But when the amount of data reaches 1 > million, the scans become very slow.For the scans we basically set startRow > and endRow and apply different filters. Several threads always require > batches of 1000 rows. To get the 1000 rows, while we call next (), we use a > counter and when we get to 1000 we close the scan with an InterupException. > This didn't give us any problems in hbase 94 and we had good performance. > In Hbase2 we saw that there is a setLimit (int) option to specify to the > regionserver the number of rows it wants. Also I see that it is possible to > set a readType which can be PREAD or STREAM.- Do you think that setting > this option can lead to better scan performance?- What is the difference > between PREAD and STREAM?- In which case does it make sense to use PREAD / > STREAM? > We have already done some hbase server-side tuning, but we still can't get > good scan performance.When we start working with large amounts of data, we > start to see a lot of server-side "responseTooSlow".like:2021-10-28 16: 45: > 00,854 WARN [RpcServer.default.FPBQ.Fifo.handler = 46, queue = 1, port = > 16020] ipc.RpcServer: (responseTooSlow): {"call": "Scan (org. > apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos $ ScanRequest) > "," starttimems ":" 1635432272849 "," responsesize ":" 221799 "," method > ":" Scan "," param ":" scanner_id: 3011016724423115474 number_of_rows: 1000 > close_scanner: false next_call_seq: 0 client_handles_partials: true > client_handles_heartbeats: tr \ u003cTRUNCATED \ u003e "," processingtimems > ": 28005," client ":" 10.200.86.173:60806","queuetimclass "":0 > HRegionServer "," scandetails ":" table: mn1_7491_hinvio region: > mn1_7491_hinvio .....} > > Thanks, > Hamado Dene