Oh, I see, 1.2.0. Try scan.setSmall(true)? Stack <st...@duboce.net> 于2019年1月1日周二 上午6:29写道:
> On Sat, Dec 29, 2018 at 8:06 AM ming.liu <ming....@esgyn.cn> wrote: > > > Thanks Stack, > > > > I have an impression that Get makes a Scan under the cover. But that > > cannot explain my observation of the performance difference between Get a > > single row vs. San a single row. > > > > > Here is how the Get gets converted into a Scan: > > https://github.com/apache/hbase/blob/branch-1.2/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6920 > Maybe try doing same in your experiment and if still a difference, flle an > issue and upload your test code. Explain how you ran your test (copy/paste > from here). branch-1.2 is old. I'd be interested in trying your test > against branch-2 to see if it has the issue you see. > > > > I assume the difference comes from the blockcache, Get() will first match > > the block cache, if it matches, the call finish and return back. But Scan > > will not match the block cache, it will go to memstore and then go to > HFile > > if it is not in the memstore. > > > > > We first go to memstore, and if we have not satisfied the query, then go to > hfiles. Hfiles will fetch from blocks from blockcache if present else will > go to hdfs (and then populate cache). Should work this way whether Get or > Scan. > > Thanks, > S > > > > My test program will do Get in a loop, for example, 1000 times of Get. > > Before the loop, I save the startime, and then after 1000 loops of Get, > > save the endtime. So (endtime - startime) / loop-count is the time spent > in > > each Get operation. > > I have that same loop, replacing get() with scan(). The scan() will have > > startRowKey = endRowkey, so it is just one row. > > > > I run the test program many times, using HBase 1.2.0. It shows the Scan > is > > 2x slower than the get. So I want to understand the root cause. I assume > > get() will match the row in blockcache, so it will not go to the memstore > > or HFile. But scan() must go to HFile, because in my test, there is no > put > > operation, just pure read. The row was inserted long time ago. So it > should > > flush into HFile, and not in the memstore anymore. But I cannot > > confirm/verify this. So scan() have to send a request to HDFS to read > from > > HFile, and it is slower than the get() operation. > > > > I can paste the test program if the description is still not clear. > > > > I may need to replace Scan with Get whenever possible, if there do have a > > performance difference. But if it is not true, I don't bother to modify > > this. > > > > thanks, > > Ming > > > > -----Original Message----- > > From: Stack <st...@duboce.net> > > Sent: Saturday, December 29, 2018 11:50 PM > > To: Hbase-User <user@hbase.apache.org> > > Subject: Re: Will Scan use blockcache? > > > > A Get is a one-row Scan. Under the covers the Get makes a Scan. Scan/Get > > both have to go to memstore since it will have latest versions of Cells. > > > > Say more about how you are doing the compare please. > > > > S > > > > On Sat, Dec 29, 2018 at 7:02 AM ming.liu <ming....@esgyn.cn> wrote: > > > > > Hi, all, > > > > > > > > > > > > I recently found that short scan is slower than get operation in HBase. > > It > > > is acceptable, but I really want to understand the reason. > > > > > > > > > > > > My testing table only has one row in it. So both Scan and Get just get > > one > > > row. Scan is still about 2x slower than get operation. > > > > > > So I want to understand the difference between get(rowkey) and > > Scan(rowkey, > > > rowkey). > > > > > > > > > > > > I think Get will first match in blockcache, if matched, it will go back > > > without accessing HFile/Memstore; > > > > > > Will Scan search in blockcache as well? Or it directly go to > > > memstore/HFile? > > > > > > > > > > > > thanks, > > > > > > Ming > > > > > > > > > > > > > > > > >