Thanks Tatsuya. Will give "vm.swappiness" a shot.
On Fri, Oct 15, 2010 at 4:42 PM, Tatsuya Kawano <[email protected]>wrote: > > Hi Abhi, > > > Can you give more inputs on what might be the drawbacks or risks of > > permanent swap off or what was the observed horror? > > > Turning off the swap means you'll meet Linux OOM Killer more often. OOM > Killer (Out Of Memory Killer) tends to kill processes that use larger > memory space, so RS can be targeted. Even worse, OOM Killer could get stuck > because of the low memory situation. It will use up CPU time (%system) and > you won't be able to ssh into the machine for a while. > > Instead of turning off the swap, I would suggest to lower a kernel > parameter called "vm.swappiness". It takes a number between 0 to 100; higher > value makes the kernel to swap more often so that it can allocate more RAM > for the file cache, and lower value makes it less often to swap. So you want > a lower value. > > It's default to 60 on many Linux distributions. Try to make it 0. > > Thanks, > Tatsuya > > -- > Tatsuya Kawano > Tokyo, Japan > > http://twitter.com/tasuys6502 > > > > > On 10/16/2010, at 3:12 AM, Abhijit Pol wrote: > > >> > >>> we did swapoff -a and then updated fstab to permanently turn it off. > >> > >> You might not want to turn it off completely. One of the lads was > >> recently talking about the horrors that can happen when no swap. > >> > >> But sounds like you were doing over eager swapping up to this? > >> > >> > > http://wiki.apache.org/hadoop/PerformanceTuning recommends removing swap > and > > we had swap off on part of the cluster and those machines were doing well > in > > terms of RS crash and other machines were doing lots of swap. So we > decided > > to turn it off for all RS machines. > > > > Can you give more inputs on what might be the drawbacks or risks of > > permanent swap off or what was the observed horror? > > > > > > > >>> we observed swap was actually happening on RSs and after we turned it > off > >> we > >>> have much stable RSs. > >>> > >>> i can tell what we have, not sure that is optimal, in fact looking for > >>> comments/suggestions from folks who have used it more: > >>> 64GB RAM ==> 85% given to HBASE HEAP (30% memstore, 60%block cache) , > >> 512MB > >>> DN and 512MB TT > >>> > >> > >> So, I'm bad at math, but thats a heap of 50+GB? Hows that working out > >> for you? You played with GC tuning at all? You might give more to > >> the DN and the TT since you have plenty -- and more to the OS... > >> perhaps less to hbase? > >> > >> How many disks? > >> > > > > We played with GC. What worked well so far is starting CMS little early > at > > 40% occupancy; we removed 6m newgen restrictions and observed that we are > > not growing beyond 18mb and minor GC is coming every seconds instead of > > every 200ms in steady state (we might cap maxnewgen if things go bad), > but > > so far all pauses are small less than second and No full GC kicked in. > > > > We have given more to HBase (and specifically to block cache) because we > > want 95% read latencies below 20ms and our load is random read heavy with > > light read-modify-writes. > > The rational was to go for small hbase blocks (8KB); larger than HBase > but > > smaller than default HDFS block size (64KB); and large block cache to > > improve hit rate (~37GB) > > We did very limited experiments with different blocks sizes before going > > with this configurations. > > > > We have 1Gb for DN. We don't run map-reduce much on this cluster so given > > 512MB to TT. We have separate Hadoop cluster for all our MR > > and analytics needs. > > > > We have 6x1TB disks per machine. > > > > > >>> we have 64KB HDFS block size > >> > >> Do you mean 64MB? > >> > >> > >> > > Its 64KB. Our keys are random enough to have very low chance > > of exploiting block locality. So for every miss in block cache will read > one > > or more random HDFS blocks anyways and hence it make sense to go for > lower > > HDFS block size. After getting HBASE-3006 in things improved a lot for > us. > > > > We use large 128MB blocks for our analytic hadoop cluster as it has more > > seq. reads. Do you think smaller size like 64KB might be actually > hearting > > us? > > > > > > > >> You've done the other stuff -- ulimits and xceivers? > >> > > > > We have 64k ulimit for all our hadoop cluster machines and xceivers is > set > > to 2048 for hbase cluster > > > > > >> > >> Hows it running for you? > >> > > > > I will post some real numbers next week when we have it running for 7 > days > > with current config. > > > > I won't say we have nailed down everything, but better than what we > started > > with. > > > > Any inputs will be really helpful or anything you think we are doing > stupid > > or totally missing it :-) > > -- > Tatsuya Kawano (Mr.) > Tokyo, Japan > > http://twitter.com/tatsuya6502 > > > >
