Which version do you use? Get is always a one time rpc call, but scan may lead to multiple rpc calls, which depends on the hbase version and also some flags in the Scan object.
ming.liu <ming....@esgyn.cn>于2018年12月30日 周日00:06写道: > Thanks Stack, > > I have an impression that Get makes a Scan under the cover. But that > cannot explain my observation of the performance difference between Get a > single row vs. San a single row. > > I assume the difference comes from the blockcache, Get() will first match > the block cache, if it matches, the call finish and return back. But Scan > will not match the block cache, it will go to memstore and then go to HFile > if it is not in the memstore. > > My test program will do Get in a loop, for example, 1000 times of Get. > Before the loop, I save the startime, and then after 1000 loops of Get, > save the endtime. So (endtime - startime) / loop-count is the time spent in > each Get operation. > I have that same loop, replacing get() with scan(). The scan() will have > startRowKey = endRowkey, so it is just one row. > > I run the test program many times, using HBase 1.2.0. It shows the Scan is > 2x slower than the get. So I want to understand the root cause. I assume > get() will match the row in blockcache, so it will not go to the memstore > or HFile. But scan() must go to HFile, because in my test, there is no put > operation, just pure read. The row was inserted long time ago. So it should > flush into HFile, and not in the memstore anymore. But I cannot > confirm/verify this. So scan() have to send a request to HDFS to read from > HFile, and it is slower than the get() operation. > > I can paste the test program if the description is still not clear. > > I may need to replace Scan with Get whenever possible, if there do have a > performance difference. But if it is not true, I don't bother to modify > this. > > thanks, > Ming > > -----Original Message----- > From: Stack <st...@duboce.net> > Sent: Saturday, December 29, 2018 11:50 PM > To: Hbase-User <user@hbase.apache.org> > Subject: Re: Will Scan use blockcache? > > A Get is a one-row Scan. Under the covers the Get makes a Scan. Scan/Get > both have to go to memstore since it will have latest versions of Cells. > > Say more about how you are doing the compare please. > > S > > On Sat, Dec 29, 2018 at 7:02 AM ming.liu <ming....@esgyn.cn> wrote: > > > Hi, all, > > > > > > > > I recently found that short scan is slower than get operation in HBase. > It > > is acceptable, but I really want to understand the reason. > > > > > > > > My testing table only has one row in it. So both Scan and Get just get > one > > row. Scan is still about 2x slower than get operation. > > > > So I want to understand the difference between get(rowkey) and > Scan(rowkey, > > rowkey). > > > > > > > > I think Get will first match in blockcache, if matched, it will go back > > without accessing HFile/Memstore; > > > > Will Scan search in blockcache as well? Or it directly go to > > memstore/HFile? > > > > > > > > thanks, > > > > Ming > > > > > > > > > >