Thanks for your response Jonathan. We'll be doing largely single-row random lookups. In this scenario, would it be best to try to make the block size encompass a single row? How significant is the performance hit if hbase has to dig up multiple blocks to serve a singe row?
On May 18, 2010, at 3:12 PM, Jonathan Gray wrote: > It would depend on your read patterns. > > Is everything going to be single row gets, or will you also scan? > > Single row lookups will be faster with smaller block sizes, at the expense of > a larger index size (and potentially slower scans as you have to deal with > more block fetches). > >> -----Original Message----- >> From: Jason Strutz [mailto:[email protected]] >> Sent: Tuesday, May 18, 2010 9:33 AM >> To: [email protected] >> Subject: Optimal block size for large columns >> >> I am working with a small cluster, trying to nail down appropriate >> settings for block size. We will have a single table with a single >> column of data averaging 300k in size, sometimes upwards of 2mb, never >> more than 10mb. >> >> Is there any rule-of-thumb or other sage advice for block sizes for >> large columns? >> >> Thanks!
