Could it improve read performance by storing HFile consecutive on disk?

yun peng Tue, 09 Jul 2013 09:07:59 -0700

In our use case memory/cache is small, and we want to improve read/load
(from-disk) performance by storing HFile blocks consecutively on disk...
The idea is that if we store blocks more closely on disk, then read a data
block from HFile would require fewer random disk access.


In particular, to lookup a value or to read a data block in HFile, it needs
the b-tree style root-to-leaf traversal. For each step in a traversal, it
needs load block from disk. Since the blocks along the root-to-leaf path
are not stored consecutively, those reads are typically random. I am not
sure if we can store all the block in a root-to-leaf path in a consecutive
disk area, then we can translate random reads to sequential reads, which
should be faster.

Regards,
Yun

Could it improve read performance by storing HFile consecutive on disk?

Reply via email to