Right now I have 4GB of heap per regionserver, and as Stack suggested, I have set hfile.block.cache.size to 0.5. At the moment of doing Gets there's nothing more running that would affect performance. Cells are very small - they contain 1 integer and this table has about 20M rows, it spans over 4 regionservers, so I have about 64 regions, each is 256MB.
I use RAID, but this will be changed soon, but I takes time (we're moving to new servers). I didn't notice any improvement after changing option hfile.block.cache.size, I don't know if this i relevant, but in my testing job I do at most only one Get per row (before querying HBase I do DISTINCT). Stats from cache reads are here: http://pastebin.com/BmmL09dK This is after restarting servers, and during running first job. Thanks for helping me. 2010/11/3 Stack <[email protected]> > On Wed, Nov 3, 2010 at 7:15 AM, Wojciech Langiewicz > <[email protected]> wrote: > > > > I'm running latest version from Cloudera > > > Try a later version of the 0.89 series. See the downloads page on our > site. It has perf. improvements. > > > >> Each KV is a distinct Put operation? Normally people get high > throughput > >> by batching many Puts at once. > >> > > > > Actually, here I'm asking about Get operations, because I don't know how > to > > batch them (by design). But in case of Puts you are right. > > > > There is a batch Get in TRUNK that should be available as 0.90.0RC0 soon. > > > I'm rather asking what can I expect from my schema design and hardware by > > comparing other people solutions, right now I'm getting 10 times less > > performance that I initially wanted. > > > > > Well, if going to disk, reading we're talking 10-30ms a hit. If you > are reading from cache, you should see 5ms and less. Try upping > proportion of your heap given over to block cache; set > hfile.block.cache.size to 0.4 or 0.5 of heap (Writes should be going > in pretty fast -- ~5m or less). > > What size your cells? How many regions in your table? How much RAM > have you given over to HBase? Anything else running on these > machines? You doing any wacky RAID'ing on those disks? > > Good luck, > St.Ack > -- Wojciech Langiewicz
