Hello,

On Tue, Jan 31, 2012 at 3:45 PM, Stack <[email protected]> wrote:

> On Mon, Jan 30, 2012 at 5:27 PM, Zheng Da <[email protected]> wrote:
> > Hello,
> >
> > I'm thinking of using HBase to store a matrix, so each subblock of a
> matrix
> > is stored as a value in HBase, and the key of the value is the location
> of
> > the subblock in the matrix. At beginning, I wanted the subblock to be as
> > large as 8MB. But when I read
> > http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html, I
> > found HBase splits keyvalue pairs into blocks and the block size is
> usually
> > much smaller than 8MB. So what happens if I store data of 8MB as a value
> in
> > HBase? I tried, and it seems to work fine. But how about the performance?
> >
>
> Please point to what in that blog has you thinking we split keyvalues.
>  We do not.
>
It mentions "block size" and also the figure shows data is split into
blocks and each block starts with a magic header, which shows whether data
in the block is compressed or not. Also blocks in HBase is indexed.

"Minimum block size. We recommend a setting of minimum block size between
8KB to 1MB for general usage. Larger block size is preferred if files are
primarily for sequential access. However, it would lead to inefficient
random access (because there are more data to decompress). Smaller blocks
are good for random access, but require more memory to hold the block
index, and may be slower to create (because we must flush the compressor
stream at the conclusion of each data block, which leads to an FS I/O
flush). Further, due to the internal caching in Compression codec, the
smallest possible block size would be around 20KB-30KB."
So each block with its prefixed "magic" header contains either plain or
compressed data. How that looks like we will have a look at in the next
section.

If data isn't split into blocks, how do these things work?

>
> Writing, we persist files that by default use hdfs blocks of 64MB.
> Reading we will by default read in 64k chunks (hbase read blocks).
> The 64k will contain whole keyvalues which means we likely rarely read
> exactly 64kb.  If a keyvalue is 8MB, though we're configured to read
> in 64kb blocks, we'll read in the coherent 8MB keyvalue as a block.
>
> Performance-wise, its best you try it out.  Be aware that unless you
> configure stuff otherwise, this 8MB block coming up out of the
> filesystem will probably traverse the read-side block cache and blow
> out a bunch of lesser entries.  These are the kind of things you'll
> need to think consider.  Check out the performance section in the
> hbase reference guide: http://hbase.apache.org/book.html#performance


Thanks,
Da

Reply via email to