Re: Will Scan use blockcache?

Duo Zhang Sun, 30 Dec 2018 00:24:06 -0800

Which version do you use?

Get is always a one time rpc call, but scan may lead to multiple rpc calls,
which depends on the hbase version and also some flags in the Scan object.


ming.liu <ming....@esgyn.cn>于2018年12月30日 周日00:06写道：

> Thanks Stack,
>
> I have an impression that Get makes a Scan under the cover. But that
> cannot explain my observation of the performance difference between Get a
> single row vs. San a single row.
>
> I assume the difference comes from the blockcache, Get() will first match
> the block cache, if it matches, the call finish and return back. But Scan
> will not match the block cache, it will go to memstore and then go to HFile
> if it is not in the memstore.
>
> My test program will do Get in a loop, for example, 1000 times of Get.
> Before the loop, I save the startime, and then after 1000 loops of Get,
> save the endtime. So (endtime - startime) / loop-count is the time spent in
> each Get operation.
> I have that same loop, replacing get() with scan(). The scan() will have
>  startRowKey = endRowkey, so it is just one row.
>
> I run the test program many times, using HBase 1.2.0. It shows the Scan is
> 2x slower than the get. So I want to understand the root cause. I assume
> get() will match the row in blockcache, so it will not go to the memstore
> or HFile. But scan() must go to HFile, because in my test, there is no put
> operation, just pure read. The row was inserted long time ago. So it should
> flush into HFile, and not in the memstore anymore. But I cannot
> confirm/verify this. So scan() have to send a request to HDFS to read from
> HFile, and it is slower than the get() operation.
>
> I can paste the test program if the description is still not clear.
>
> I may need to replace Scan with Get whenever possible, if there do have a
> performance difference. But if it is not true, I don't bother to modify
> this.
>
> thanks,
> Ming
>
> -----Original Message-----
> From: Stack <st...@duboce.net>
> Sent: Saturday, December 29, 2018 11:50 PM
> To: Hbase-User <user@hbase.apache.org>
> Subject: Re: Will Scan use blockcache?
>
> A Get is a one-row Scan. Under the covers the Get makes a Scan. Scan/Get
> both have to go to memstore since it will have latest versions of Cells.
>
> Say more about how you are doing the compare please.
>
> S
>
> On Sat, Dec 29, 2018 at 7:02 AM ming.liu <ming....@esgyn.cn> wrote:
>
> > Hi, all,
> >
> >
> >
> > I recently found that short scan is slower than get operation in HBase.
> It
> > is acceptable, but I really want to understand the reason.
> >
> >
> >
> > My testing table only has one row in it. So both Scan and Get just get
> one
> > row. Scan is still about 2x slower than get operation.
> >
> > So I want to understand the difference between get(rowkey) and
> Scan(rowkey,
> > rowkey).
> >
> >
> >
> > I think Get will first match in blockcache, if matched, it will go back
> > without accessing HFile/Memstore;
> >
> > Will Scan search in blockcache as well? Or it directly go to
> > memstore/HFile?
> >
> >
> >
> > thanks,
> >
> > Ming
> >
> >
> >
> >
>
>

Re: Will Scan use blockcache?

Reply via email to