we did swapoff -a and then updated fstab to permanently turn it off. we observed swap was actually happening on RSs and after we turned it off we have much stable RSs.
i can tell what we have, not sure that is optimal, in fact looking for comments/suggestions from folks who have used it more: 64GB RAM ==> 85% given to HBASE HEAP (30% memstore, 60%block cache) , 512MB DN and 512MB TT we have 64KB HDFS block size and 8K Hbase block size as our load is random read dominated. any suggestions/comments? On Mon, Oct 11, 2010 at 2:28 PM, Sean Bigdatafun <[email protected]>wrote: > On Sun, Oct 10, 2010 at 12:28 PM, Abhijit Pol <[email protected]> wrote: > > > Thanks Stack. > > > > I think we have GC under control. We have CMS tunned to start early and > > don't see slept x longer y in logs anymore. We also have higher zk > timeout > > (150 seconds), guess can bump that up a bit. > > > > I was able to point to swap on couple of RSs. Will disable the swap and > see > > how that helps suicides. We observed RSs on machines with swap disabled > > doing very good so far. > > > Disabling SWAP at the OS level (i.e., resize /swap to zero)? If we control > the sum of each JVM's heasize to be under the total physical memory size, > does the disabling-or-not make any difference? Thanks. > e.g., RS(6GB) + DN (2GB) + TaskTracker (1GB) +... < 16GB seems to be quite > possible, does not it? > > Lots of people mentioned that we should give a lot of physical memory to > RegionServer machine. But I'd like to ask a more detailed memory allocation > breakdown because in practice, RS runs along with other service like Znode, > Datanode, TaskTracker and etc. > > > BTW, for on bully machines, how much heapsize have you allocated to > RegionServer? > > > > > > Also, as you suggested we will take odd man out. We don't have to have it > > in. Our master is already low key machine. > > > > --Abhi > > > > > > On Sat, Oct 9, 2010 at 11:12 PM, Stack <[email protected]> wrote: > > > > > On Sat, Oct 9, 2010 at 1:15 PM, Abhijit Pol <[email protected]> > wrote: > > > > We are testing with 4 nodes HBase cluster out of which 3 machines are > > > > identical with 64GB RAM and 6x1TB disks. and 4th machine has only > 16GB > > > RAM > > > > and 2x1TB disks > > > > > > > > We observe (from server side metrics) frequent latency spikes and RS > > > suicide > > > > ~ every 8hrs from our 4th machine. > > > > > > > > > > How much heap have you given your servers? You could up your zk > > > timeout of play with GC tunings -- if full GC the reason RSs are > > > committing hari-kari. > > > > > > > We do have overall heap size configured based on total RAM available > > but > > > all > > > > other configs are same across RSs > > > > > > > > Is there a way to hint master to distribute regions based on > > > > available resources? > > > > > > > > > > No. Not currently. > > > > > > > We are using 0.89.20100924 branch. We have flop at default 0.3 and > > > roughly > > > > equal number of regions across all RSs. > > > > > > > > > > > > > > I'd suggest taking the odd-man-out out of your cluster or repurposing > > > it as a master node. Usually clusters are homogeneous and much of the > > > software assumes each node equivalent. We've not had a chance to work > > > on clusters made of differently spec'd machines. > > > > > > St.Ack > > > > > >
