Hi William. Answers inline. > -----Original Message----- > From: William Kang [mailto:[email protected]] > Sent: Monday, October 18, 2010 7:48 PM > To: hbase-user > Subject: HBase random access in HDFS and block indices > > Hi, > Recently I have spent some efforts to try to understand the mechanisms > of HBase to exploit possible performance tunning options. And many > thanks to the folks who helped with my questions in this community, I > have sent a report. But, there are still few questions left. > > 1. If a HFile block contains more than one keyvalue pair, will the > block index in HFile point out the offset for every keyvalue pair in > that block? Or, the block index will just point out the key ranges > inside that block, so you have to traverse inside the block until you > meet the key you are looking for?
It is the latter. Block index points to the start keys of each block, so you effectively have a range for each block. Lots of work has gone in recently to seek/reseek/early-out when possible and skip unnecessary blocks. > 2. When HBase read block to fetching the data or traverse in it, is > this block read into memory? Yes. And if the block cache is turned on, it will be put into an LRU cache. > 3. HBase blocks (64k configurable) are inside HDFS blocks (64m > configurable), to read the HBase blocks, we have to random access the > HDFS blocks. Even HBase can use in(p, buf, 0, x) to read a small > portion of the larger HDFS blocks, it is still a random access. Would > this be slow? Yes, this is still random access. HBase provides the indexing/retrieval/etc on top of HDFS to make the random read access as efficient as possible (and with caching) and makes random writes possible. JG > > Many thanks. I would be grateful for your answers. > > > William
