Thanks Stack, I have an impression that Get makes a Scan under the cover. But that cannot explain my observation of the performance difference between Get a single row vs. San a single row.
I assume the difference comes from the blockcache, Get() will first match the block cache, if it matches, the call finish and return back. But Scan will not match the block cache, it will go to memstore and then go to HFile if it is not in the memstore. My test program will do Get in a loop, for example, 1000 times of Get. Before the loop, I save the startime, and then after 1000 loops of Get, save the endtime. So (endtime - startime) / loop-count is the time spent in each Get operation. I have that same loop, replacing get() with scan(). The scan() will have startRowKey = endRowkey, so it is just one row. I run the test program many times, using HBase 1.2.0. It shows the Scan is 2x slower than the get. So I want to understand the root cause. I assume get() will match the row in blockcache, so it will not go to the memstore or HFile. But scan() must go to HFile, because in my test, there is no put operation, just pure read. The row was inserted long time ago. So it should flush into HFile, and not in the memstore anymore. But I cannot confirm/verify this. So scan() have to send a request to HDFS to read from HFile, and it is slower than the get() operation. I can paste the test program if the description is still not clear. I may need to replace Scan with Get whenever possible, if there do have a performance difference. But if it is not true, I don't bother to modify this. thanks, Ming -----Original Message----- From: Stack <st...@duboce.net> Sent: Saturday, December 29, 2018 11:50 PM To: Hbase-User <user@hbase.apache.org> Subject: Re: Will Scan use blockcache? A Get is a one-row Scan. Under the covers the Get makes a Scan. Scan/Get both have to go to memstore since it will have latest versions of Cells. Say more about how you are doing the compare please. S On Sat, Dec 29, 2018 at 7:02 AM ming.liu <ming....@esgyn.cn> wrote: > Hi, all, > > > > I recently found that short scan is slower than get operation in HBase. It > is acceptable, but I really want to understand the reason. > > > > My testing table only has one row in it. So both Scan and Get just get one > row. Scan is still about 2x slower than get operation. > > So I want to understand the difference between get(rowkey) and Scan(rowkey, > rowkey). > > > > I think Get will first match in blockcache, if matched, it will go back > without accessing HFile/Memstore; > > Will Scan search in blockcache as well? Or it directly go to > memstore/HFile? > > > > thanks, > > Ming > > > >