Thanks Stack,

I have an impression that Get makes a Scan under the cover. But that cannot 
explain my observation of the performance difference between Get a single row 
vs. San a single row.

I assume the difference comes from the blockcache, Get() will first match the 
block cache, if it matches, the call finish and return back. But Scan will not 
match the block cache, it will go to memstore and then go to HFile if it is not 
in the memstore.

My test program will do Get in a loop, for example, 1000 times of Get. Before 
the loop, I save the startime, and then after 1000 loops of Get, save the 
endtime. So (endtime - startime) / loop-count is the time spent in each Get 
operation.
I have that same loop, replacing get() with scan(). The scan() will have   
startRowKey = endRowkey, so it is just one row.

I run the test program many times, using HBase 1.2.0. It shows the Scan is 2x 
slower than the get. So I want to understand the root cause. I assume get() 
will match the row in blockcache, so it will not go to the memstore or HFile. 
But scan() must go to HFile, because in my test, there is no put operation, 
just pure read. The row was inserted long time ago. So it should flush into 
HFile, and not in the memstore anymore. But I cannot confirm/verify this. So 
scan() have to send a request to HDFS to read from HFile, and it is slower than 
the get() operation.

I can paste the test program if the description is still not clear.

I may need to replace Scan with Get whenever possible, if there do have a 
performance difference. But if it is not true, I don't bother to modify this.

thanks,
Ming

-----Original Message-----
From: Stack <st...@duboce.net> 
Sent: Saturday, December 29, 2018 11:50 PM
To: Hbase-User <user@hbase.apache.org>
Subject: Re: Will Scan use blockcache?

A Get is a one-row Scan. Under the covers the Get makes a Scan. Scan/Get
both have to go to memstore since it will have latest versions of Cells.

Say more about how you are doing the compare please.

S

On Sat, Dec 29, 2018 at 7:02 AM ming.liu <ming....@esgyn.cn> wrote:

> Hi, all,
>
>
>
> I recently found that short scan is slower than get operation in HBase. It
> is acceptable, but I really want to understand the reason.
>
>
>
> My testing table only has one row in it. So both Scan and Get just get one
> row. Scan is still about 2x slower than get operation.
>
> So I want to understand the difference between get(rowkey) and Scan(rowkey,
> rowkey).
>
>
>
> I think Get will first match in blockcache, if matched, it will go back
> without accessing HFile/Memstore;
>
> Will Scan search in blockcache as well? Or it directly go to
> memstore/HFile?
>
>
>
> thanks,
>
> Ming
>
>
>
>

Reply via email to