On Fri, Oct 15, 2010 at 11:12 AM, Abhijit Pol <[email protected]> wrote:
> > > > > we did swapoff -a and then updated fstab to permanently turn it off. > > > > You might not want to turn it off completely. One of the lads was > > recently talking about the horrors that can happen when no swap. > > > > But sounds like you were doing over eager swapping up to this? > > > > > http://wiki.apache.org/hadoop/PerformanceTuning recommends removing swap > and > we had swap off on part of the cluster and those machines were doing well > in > terms of RS crash and other machines were doing lots of swap. So we decided > to turn it off for all RS machines. > > Can you give more inputs on what might be the drawbacks or risks of > permanent swap off or what was the observed horror? > > > > > > we observed swap was actually happening on RSs and after we turned it > off > > we > > > have much stable RSs. > > > > > > i can tell what we have, not sure that is optimal, in fact looking for > > > comments/suggestions from folks who have used it more: > > > 64GB RAM ==> 85% given to HBASE HEAP (30% memstore, 60%block cache) , > > 512MB > > > DN and 512MB TT > > > > > > > So, I'm bad at math, but thats a heap of 50+GB? Hows that working out > > for you? You played with GC tuning at all? You might give more to > > the DN and the TT since you have plenty -- and more to the OS... > > perhaps less to hbase? > > > > How many disks? > > > > We played with GC. What worked well so far is starting CMS little early at > 40% occupancy; we removed 6m newgen restrictions and observed that we are > not growing beyond 18mb and minor GC is coming every seconds instead of > every 200ms in steady state (we might cap maxnewgen if things go bad), but > so far all pauses are small less than second and No full GC kicked in. > > We have given more to HBase (and specifically to block cache) because we > want 95% read latencies below 20ms and our load is random read heavy with > light read-modify-writes. > The rational was to go for small hbase blocks (8KB); larger than HBase but > smaller than default HDFS block size (64KB); and large block cache to > improve hit rate (~37GB) > We did very limited experiments with different blocks sizes before going > with this configurations. > > We have 1Gb for DN. We don't run map-reduce much on this cluster so given > 512MB to TT. We have separate Hadoop cluster for all our MR > and analytics needs. > > We have 6x1TB disks per machine. > > > > > we have 64KB HDFS block size > > > > Do you mean 64MB? > > > > > > > Its 64KB. Our keys are random enough to have very low chance > of exploiting block locality. So for every miss in block cache will read > one > or more random HDFS blocks anyways and hence it make sense to go for lower > HDFS block size. After getting HBASE-3006 in things improved a lot for us. > If this is your setup, your HDFS' namenode is bound to OOM soon. (Namenode's memory consumption is proportional to the number of blocks on HDFS) I guess you meant "hfile.min.blocksize.size" in ? That is a different parameter from HDFS' block size, IMO. (need someone to confirm) > > We use large 128MB blocks for our analytic hadoop cluster as it has more > seq. reads. Do you think smaller size like 64KB might be actually hearting > us? > > > > > You've done the other stuff -- ulimits and xceivers? > > > > We have 64k ulimit for all our hadoop cluster machines and xceivers is set > to 2048 for hbase cluster > > > > > > Hows it running for you? > > > > I will post some real numbers next week when we have it running for 7 days > with current config. > > I won't say we have nailed down everything, but better than what we started > with. > > Any inputs will be really helpful or anything you think we are doing stupid > or totally missing it :-) >
