On Mon, Jan 30, 2012 at 5:27 PM, Zheng Da <[email protected]> wrote: > Hello, > > I'm thinking of using HBase to store a matrix, so each subblock of a matrix > is stored as a value in HBase, and the key of the value is the location of > the subblock in the matrix. At beginning, I wanted the subblock to be as > large as 8MB. But when I read > http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html, I > found HBase splits keyvalue pairs into blocks and the block size is usually > much smaller than 8MB. So what happens if I store data of 8MB as a value in > HBase? I tried, and it seems to work fine. But how about the performance? >
Please point to what in that blog has you thinking we split keyvalues. We do not. Writing, we persist files that by default use hdfs blocks of 64MB. Reading we will by default read in 64k chunks (hbase read blocks). The 64k will contain whole keyvalues which means we likely rarely read exactly 64kb. If a keyvalue is 8MB, though we're configured to read in 64kb blocks, we'll read in the coherent 8MB keyvalue as a block. Performance-wise, its best you try it out. Be aware that unless you configure stuff otherwise, this 8MB block coming up out of the filesystem will probably traverse the read-side block cache and blow out a bunch of lesser entries. These are the kind of things you'll need to think consider. Check out the performance section in the hbase reference guide: http://hbase.apache.org/book.html#performance St.Ack
