On Wed, Nov 3, 2010 at 7:15 AM, Wojciech Langiewicz <[email protected]> wrote: > > I'm running latest version from Cloudera
Try a later version of the 0.89 series. See the downloads page on our site. It has perf. improvements. >> Each KV is a distinct Put operation? Normally people get high throughput >> by batching many Puts at once. >> > > Actually, here I'm asking about Get operations, because I don't know how to > batch them (by design). But in case of Puts you are right. > There is a batch Get in TRUNK that should be available as 0.90.0RC0 soon. > I'm rather asking what can I expect from my schema design and hardware by > comparing other people solutions, right now I'm getting 10 times less > performance that I initially wanted. > > Well, if going to disk, reading we're talking 10-30ms a hit. If you are reading from cache, you should see 5ms and less. Try upping proportion of your heap given over to block cache; set hfile.block.cache.size to 0.4 or 0.5 of heap (Writes should be going in pretty fast -- ~5m or less). What size your cells? How many regions in your table? How much RAM have you given over to HBase? Anything else running on these machines? You doing any wacky RAID'ing on those disks? Good luck, St.Ack
