A lot of useful information here... I disabled bloom filters I changed to gz compression (compressed files significantly)
I'm now seeing about *80gets/sec/server* which is a pretty good improvement. Since I estimate that the server is capable of about 300-350 hard disk operations/second, that's about 4 hard disk operations/get. I will experiment with the BLOCKSIZE next. Unfortunately upgrading our system to a newer HBASE/Hadoop is tricky for various IT/regulation reasons but I'll ask to upgrade. From what I see, even Cloudera 4.5.0 still comes with HBase 94.6 I also restarted the regionservers and am now getting blockCacheHitCachingRatio=51% and blockCacheHitRatio=51%. So conceivably, I could be hitting the: root index (cache hit) block index (cache hit) load on average 2 blocks to get data (cache misses most likely as my total heap space is 1/7 the compressed dataset) That would be about 52% cache hit overall and if each data access requires 2 Hard Drive reads (data + checksum) then that would explain my throughput. It still seems high but probably within the realm of reason. Does HBase always read a full block (the 64k HFile block, not the HDFS block) at a time or can it just jump to a particular location within the block? -- View this message in context: http://apache-hbase.679495.n3.nabble.com/Slow-Get-Performance-or-how-many-disk-I-O-does-it-take-for-one-non-cached-read-tp4055545p4055564.html Sent from the HBase User mailing list archive at Nabble.com.
