Thanks Stack. We will try those patches and upgrade to 0.90.3 and see how things improve. I will update in few days.
GC pause don't follow any increasing pattern, so we can eliminate that. Store files, I had confusing input. We have 300 regions, 2 column families, and _total_ 1300-1500 files. This is still higher than expected 600 files. It appears that after major compaction eventually each regions get to 2 files state, but other regions start adding 2 more files because of ongoing writes and hence file count almost always is higher than 1200. * * *Abhijit Pol | Senior Rocket Scientist | [email protected] | 408.892.3377 p |* On Fri, Jun 10, 2011 at 12:43 PM, Stack <[email protected]> wrote: > On Fri, Jun 10, 2011 at 10:10 AM, Abhijit Pol <[email protected]> wrote: > >> performed table flush at around peak timeouts 35%. Timeouts went up a > bit > > during flush and then dropped to 28% mark (compared to 10% timeouts after > > restart of server). They started climbing up again and reached 35% mark > in > > an hour or so. > > > This would seem to indicate that getting from memstore is at least > part of the problem (but not the complete explaination. HBASE-3855 > should help. FYI, HBASE-3855 won't be in a release till 0.90.4 hbase) > > >> we do major compact once a day during off peak night time. during major > > compaction our timeouts go even higher (40%) and after major compaction > they > > come back to previous high and keep increasing from there. > > > > This would seem to say that the number of storefiles is NOT the issue. > > > > for our main table with two column families, 300 regions across 10 > machines, > > each machine has around 1500 files over the course of day. After major > > compaction they come down to 1300 or so. > > > > 30 regions per machine with two column families per region would seem > to suggest that after a major compaction, you should have only 60 > storefiles or so. There are other regions on these regionservers and > they make up the bulk of the storefiles? > > > > If you look in your regionserver logs, what do the stats on your block > >> cache look like? Do the cache hits climb over time? > >> > >> cache hit starts out at around 80% on restart and then climbs up > > and stabilizes at around 90% within an hr or two. (total RAM per RS, > 98GB, > > 60% given to HBase, 50% of which is block cache and 40% memstore) > > > > So, HBase has about 48G heap? > > Could it be GC frolics that are responsible for the other portion of > the slow down? Seems like you are GC logging. Does overall pause > time tend upward? > > >> The client may have gone away because we took too long to process the > >> request. How many handlers are you running? Maybe the requests are > >> backing up in rpc queues? > >> > >> we are using 500 handler count per region server. checked on RPC queue > time > > stats from metrics, it is zero most of the times, occasionally see single > > digit number for it. > > what are side effects of going for higher handler count? more memory? > > > > Contention if too many handlers in flight at the one time. RPC also > keeps a queue per handler instance. Backed up queues are holding > edits in memory. Doesn't seem like this is your issue though. Stuff > seems to be moving right along. Anything else interesting in those > rpc metrics? You see rising latency here? Can you finger any > particular invocation? > > > >> Why 0.90.0 and not 0.90.3 (has some fixes). > >> > >> yes, it's on our list. looks like we should do it sooner than later. > > > > One advantage of newer stuff is you can use the decommission script to > change configs on a single RS to try things w/o disrupting cluster > loading. > > > > > increase in timeout % is highly correlated to read request load. During > the > > day when read requests are high, rate of increase in timeouts is higher > > compared to night. However, if we restart server at any point in time, > > timeouts goes back to 10% and start increasing. And all these issues > started > > when we increased our read volume from peak 30k qps to peak 60k qps. Our > > write volume is stable for a while at peak 3k qps. > > > > You might consider patching your hadoop with hdfs-347. See the issue. > Lots of upsides. Downsides are its experimental!!!! and currently > posted patch does not checksum (We are running this patch on our > frontend. FB runs a version of this patch on at least one of their > clusters). > > > (2) > > whenever client gets response back from hbase (with found or missing key) > > its always fast: max being less 10ms, and for timeout % read requests we > > wait for 32ms (timeout % range being from 10%-35%). > > > > Can you figure more in here? > > St.Ack >
