Hello Hbase community, We have recently switched to hbase 2.2.6 and have noticed that the SCANs are very slow. When we scan a very small amount of data (eg 100k, 200k) we do not encounter any problems. But when the amount of data reaches 1 million, the scans become very slow.For the scans we basically set startRow and endRow and apply different filters. Several threads always require batches of 1000 rows. To get the 1000 rows, while we call next (), we use a counter and when we get to 1000 we close the scan with an InterupException. This didn't give us any problems in hbase 94 and we had good performance. In Hbase2 we saw that there is a setLimit (int) option to specify to the regionserver the number of rows it wants. Also I see that it is possible to set a readType which can be PREAD or STREAM.- Do you think that setting this option can lead to better scan performance?- What is the difference between PREAD and STREAM?- In which case does it make sense to use PREAD / STREAM? We have already done some hbase server-side tuning, but we still can't get good scan performance.When we start working with large amounts of data, we start to see a lot of server-side "responseTooSlow".like:2021-10-28 16: 45: 00,854 WARN [RpcServer.default.FPBQ.Fifo.handler = 46, queue = 1, port = 16020] ipc.RpcServer: (responseTooSlow): {"call": "Scan (org. apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos $ ScanRequest) "," starttimems ":" 1635432272849 "," responsesize ":" 221799 "," method ":" Scan "," param ":" scanner_id: 3011016724423115474 number_of_rows: 1000 close_scanner: false next_call_seq: 0 client_handles_partials: true client_handles_heartbeats: tr \ u003cTRUNCATED \ u003e "," processingtimems ": 28005," client ":" 10.200.86.173:60806","queuetimclass "":0 HRegionServer "," scandetails ":" table: mn1_7491_hinvio region: mn1_7491_hinvio .....}
Thanks, Hamado Dene