Re: Slow Get Performance (or how many disk I/O does it take for one non-cached read?)

lars hofhansl Sat, 01 Feb 2014 20:07:43 -0800

HBase always loads the whole block and then seeks forward in that block until 
it finds the KV it  is looking for (there is no indexing inside the block).


Also note that HBase has compression and block encoding. These are different. 
Compression compresses the files on disk (at the HDFS level) and not in memory, 
so it does not help with your cache size. Encoding is applied at the HBase 
block level and is retained in the block cache.

I'm really curious as what kind of improvement you see with smaller block size. 
Remember that after you change BLOCKSIZE you need to issue a major compaction 
so that the data is rewritten into smaller blocks.

We should really document this stuff better.


-- Lars



________________________________
 From: Jan Schellenberger <[email protected]>
To: [email protected] 
Sent: Friday, January 31, 2014 10:31 PM
Subject: RE: Slow Get Performance (or how many disk I/O does it take for one 
non-cached read?)
 

A lot of useful information here...

I disabled bloom filters
I changed to gz compression (compressed files significantly)

I'm now seeing about *80gets/sec/server* which is a pretty good improvement. 
Since I estimate that the server is capable of about 300-350 hard disk
operations/second, that's about 4 hard disk operations/get.

I will experiment with the BLOCKSIZE next.  Unfortunately upgrading our
system to a newer HBASE/Hadoop is tricky for various IT/regulation reasons
but I'll ask to upgrade.  From what I see, even Cloudera 4.5.0 still comes
with HBase 94.6




I also restarted the regionservers and am now getting
blockCacheHitCachingRatio=51% and blockCacheHitRatio=51%.  
So conceivably, I could be hitting the: 
root index (cache hit)
block index (cache hit)
load on average 2 blocks to get data (cache misses most likely as my total
heap space is 1/7 the compressed dataset)
That would be about 52% cache hit overall and if each data access requires 2
Hard Drive reads (data + checksum) then that would explain my throughput.
It still seems high but probably within the realm of reason.

Does HBase always read a full block (the 64k HFile block, not the HDFS
block) at a time or can it just jump to a particular location within the
block?





--
View this message in context: 
http://apache-hbase.679495.n3.nabble.com/Slow-Get-Performance-or-how-many-disk-I-O-does-it-take-for-one-non-cached-read-tp4055545p4055564.html

Sent from the HBase User mailing list archive at Nabble.com.

Re: Slow Get Performance (or how many disk I/O does it take for one non-cached read?)

Reply via email to