On Sun, Oct 10, 2010 at 12:28 PM, Abhijit Pol <[email protected]> wrote:
> Thanks Stack. > > I think we have GC under control. We have CMS tunned to start early and > don't see slept x longer y in logs anymore. We also have higher zk timeout > (150 seconds), guess can bump that up a bit. > > I was able to point to swap on couple of RSs. Will disable the swap and see > how that helps suicides. We observed RSs on machines with swap disabled > doing very good so far. > Disabling SWAP at the OS level (i.e., resize /swap to zero)? If we control the sum of each JVM's heasize to be under the total physical memory size, does the disabling-or-not make any difference? Thanks. e.g., RS(6GB) + DN (2GB) + TaskTracker (1GB) +... < 16GB seems to be quite possible, does not it? Lots of people mentioned that we should give a lot of physical memory to RegionServer machine. But I'd like to ask a more detailed memory allocation breakdown because in practice, RS runs along with other service like Znode, Datanode, TaskTracker and etc. BTW, for on bully machines, how much heapsize have you allocated to RegionServer? > Also, as you suggested we will take odd man out. We don't have to have it > in. Our master is already low key machine. > > --Abhi > > > On Sat, Oct 9, 2010 at 11:12 PM, Stack <[email protected]> wrote: > > > On Sat, Oct 9, 2010 at 1:15 PM, Abhijit Pol <[email protected]> wrote: > > > We are testing with 4 nodes HBase cluster out of which 3 machines are > > > identical with 64GB RAM and 6x1TB disks. and 4th machine has only 16GB > > RAM > > > and 2x1TB disks > > > > > > We observe (from server side metrics) frequent latency spikes and RS > > suicide > > > ~ every 8hrs from our 4th machine. > > > > > > > How much heap have you given your servers? You could up your zk > > timeout of play with GC tunings -- if full GC the reason RSs are > > committing hari-kari. > > > > > We do have overall heap size configured based on total RAM available > but > > all > > > other configs are same across RSs > > > > > > Is there a way to hint master to distribute regions based on > > > available resources? > > > > > > > No. Not currently. > > > > > We are using 0.89.20100924 branch. We have flop at default 0.3 and > > roughly > > > equal number of regions across all RSs. > > > > > > > > > > I'd suggest taking the odd-man-out out of your cluster or repurposing > > it as a master node. Usually clusters are homogeneous and much of the > > software assumes each node equivalent. We've not had a chance to work > > on clusters made of differently spec'd machines. > > > > St.Ack > > >
