Re: Will Scan use blockcache?

Duo Zhang Mon, 31 Dec 2018 17:31:00 -0800

Oh, I see, 1.2.0. Try scan.setSmall(true)?

Stack <st...@duboce.net> 于2019年1月1日周二 上午6:29写道：


> On Sat, Dec 29, 2018 at 8:06 AM ming.liu <ming....@esgyn.cn> wrote:
>
> > Thanks Stack,
> >
> > I have an impression that Get makes a Scan under the cover. But that
> > cannot explain my observation of the performance difference between Get a
> > single row vs. San a single row.
> >
> >
> Here is how the Get gets converted into a Scan:
>
> https://github.com/apache/hbase/blob/branch-1.2/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6920
> Maybe try doing same in your experiment and if still a difference, flle an
> issue and upload your test code. Explain how you ran your test (copy/paste
> from here). branch-1.2 is old. I'd be interested in trying your test
> against branch-2 to see if it has the issue you see.
>
>
> > I assume the difference comes from the blockcache, Get() will first match
> > the block cache, if it matches, the call finish and return back. But Scan
> > will not match the block cache, it will go to memstore and then go to
> HFile
> > if it is not in the memstore.
> >
> >
> We first go to memstore, and if we have not satisfied the query, then go to
> hfiles. Hfiles will fetch from blocks from blockcache if present else will
> go to hdfs (and then populate cache). Should work this way whether Get or
> Scan.
>
> Thanks,
> S
>
>
> > My test program will do Get in a loop, for example, 1000 times of Get.
> > Before the loop, I save the startime, and then after 1000 loops of Get,
> > save the endtime. So (endtime - startime) / loop-count is the time spent
> in
> > each Get operation.
> > I have that same loop, replacing get() with scan(). The scan() will have
> >  startRowKey = endRowkey, so it is just one row.
> >
> > I run the test program many times, using HBase 1.2.0. It shows the Scan
> is
> > 2x slower than the get. So I want to understand the root cause. I assume
> > get() will match the row in blockcache, so it will not go to the memstore
> > or HFile. But scan() must go to HFile, because in my test, there is no
> put
> > operation, just pure read. The row was inserted long time ago. So it
> should
> > flush into HFile, and not in the memstore anymore. But I cannot
> > confirm/verify this. So scan() have to send a request to HDFS to read
> from
> > HFile, and it is slower than the get() operation.
> >
> > I can paste the test program if the description is still not clear.
> >
> > I may need to replace Scan with Get whenever possible, if there do have a
> > performance difference. But if it is not true, I don't bother to modify
> > this.
> >
> > thanks,
> > Ming
> >
> > -----Original Message-----
> > From: Stack <st...@duboce.net>
> > Sent: Saturday, December 29, 2018 11:50 PM
> > To: Hbase-User <user@hbase.apache.org>
> > Subject: Re: Will Scan use blockcache?
> >
> > A Get is a one-row Scan. Under the covers the Get makes a Scan. Scan/Get
> > both have to go to memstore since it will have latest versions of Cells.
> >
> > Say more about how you are doing the compare please.
> >
> > S
> >
> > On Sat, Dec 29, 2018 at 7:02 AM ming.liu <ming....@esgyn.cn> wrote:
> >
> > > Hi, all,
> > >
> > >
> > >
> > > I recently found that short scan is slower than get operation in HBase.
> > It
> > > is acceptable, but I really want to understand the reason.
> > >
> > >
> > >
> > > My testing table only has one row in it. So both Scan and Get just get
> > one
> > > row. Scan is still about 2x slower than get operation.
> > >
> > > So I want to understand the difference between get(rowkey) and
> > Scan(rowkey,
> > > rowkey).
> > >
> > >
> > >
> > > I think Get will first match in blockcache, if matched, it will go back
> > > without accessing HFile/Memstore;
> > >
> > > Will Scan search in blockcache as well? Or it directly go to
> > > memstore/HFile?
> > >
> > >
> > >
> > > thanks,
> > >
> > > Ming
> > >
> > >
> > >
> > >
> >
> >
>

Re: Will Scan use blockcache?

Reply via email to