Are you doing some intensive tasks at the RegionServer side (which takes more than default time out of, 60 sec -> iirc)? One can get these exception when client side socket connection is closed (probably a time out from client side). As per the exception, when RegionServer tried to send the result of a Multi call via its handler thread, the other side of the socket was closed and it failed to write any response.
Cheers, Himanshu On Tue, Aug 23, 2011 at 5:35 PM, Buttler, David <[email protected]> wrote: > So, if you use 0.5 GB / mapper and 1 GB / reducer, your total memory > consumption (minus hbase) on a slave node should be: > 4 GB M/R tasks > 1 GB OS -- just a guess > 1 GB datanode > 1 GB tasktracker > Leaving you with up to 9 GB for your region servers. I would suggest > bumping your region server ram up to 8GB, and leave a GB for OS caching. [I > am sure someone out there will tell me I am crazy] > > > However, it is the log that is the most useful part of your email. > Unfortunately I haven't seen that error before. > Are you using the Multi methods a lot in your code? > > Dave > > -----Original Message----- > From: Oleg Ruchovets [mailto:[email protected]] > Sent: Tuesday, August 23, 2011 1:38 PM > To: [email protected] > Subject: Re: how to make tuning for hbase (every couple of days hbase > region sever/s crashe) > > Thank you for detailed response, > > On Tue, Aug 23, 2011 at 7:49 PM, Buttler, David <[email protected]> wrote: > > > Have you looked at the logs of the region servers? That is a good first > > place to look. > > How many regions are in your system? > > > Region Servers > > Address Start Code Load > hadoop01 1314007529600 requests=0, regions=212, usedHeap=3171, maxHeap=3983 > hadoop02 1314007496109 requests=0, regions=207, usedHeap=2185, maxHeap=3983 > hadoop03 1314008874001 requests=0, regions=208, usedHeap=1955, maxHeap=3983 > hadoop04 1314008965432 requests=0, regions=209, usedHeap=2034, maxHeap=3983 > hadoop05 1314007496533 requests=0, regions=208, usedHeap=1970, maxHeap=3983 > hadoop06 1314008874036 requests=0, regions=208, usedHeap=1987, maxHeap=3983 > hadoop07 1314007496927 requests=0, regions=209, usedHeap=2118, maxHeap=3983 > hadoop08 1314007497034 requests=0, regions=211, usedHeap=2568, maxHeap=3983 > hadoop09 1314007497221 requests=0, regions=209, usedHeap=2148, maxHeap=3983 > master 1314008873765 requests=0, regions=208, usedHeap=2007, > maxHeap=3962 > Total: servers: 10 requests=0, regions=2089 > > most of the time GC succeeded to clean up but every 3/4 days used memory > become close to 4G > > and there are alot of Exceptions like this: > > org.apache.hadoop.ipc.*HBase*Server: IPC Server > Responder, call multi(org.apache.hadoop.*hbase*.client.MultiAction@491fb2f4 > ) > from 10.11.87.73:33737: output error > 2011-08-14 18:37:36,264 WARN org.apache.hadoop.ipc.*HBase*Server: IPC > Server > handler 24 on 8041 caught: java.nio.channels.ClosedChannelException > at > sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:133) > at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:324) > at > org.apache.hadoop.*hbase.ipc.HBaseServer.channelIO(HBase*Server.java:1387) > at > org.apache.hadoop.*hbase.ipc.HBaseServer.channelWrite(HBase* > Server.java:1339) > at > org.apache.hadoop.*hbase.ipc.HBaseServer$Responder.processResponse(HBase* > Server.java:727) > at > org.apache.hadoop.*hbase.ipc.HBaseServer$Responder.doRespond(HBase* > Server.java:792) > at > > org.apache.hadoop.*hbase.ipc.HBaseServer$Handler.run(HBase*Server.java:1083) > > > > > > > > If you are using MSLAB, it reserves 2MB/region as a buffer -- that can > add > > up when you have lots of regions. > > > > > > > > > Given so little information all my guesses are going to be wild, but they > > might help: > > 4GB may not be enough for your current load. > > Have you considered changing your memory allocation, giving less to your > > map/reduce jobs and more to HBase? > > > > > Interesting point , can you advice relation between m/r memory allocation > related to hbase region? > > currently we have 512m for map (4 map per machine) and 1024m for reduce(2 > reducers per machine) > > > > What is your key distribution like? > > Are you writing to all regions equally, or are you hotspotting on one > > region? > > > > every day before running job we manually allocates regions > with lexicographically start and end key to get good distribution and > prevent hot-spots. > > > > > > Check your cell/row sizes. Are they really large (e.g. cells > 1 MB; > rows > > > 100 MB)? Increasing region size should help here, but there may be an > > issue with your RAM allocation for HBase. > > > > > I'll check but I almost sure that we have no row > 100MB, we changed region > size for 500Mb to prevent automatic splits (after successfully inserted job > we have ~ 200-250 mb files per region) > and for the next day we allocate a new one. > > > > Are you sure that you are not overloading the machine memory? How much > RAM > > do you allocate for map reduce jobs? > > > > > 512M -- map > 1024 -- reduce > > > > How do you distribute your processes over machines? Does your master run > > namenode, hmaster, jobtracker, and zookeeper, while your slaves run > > datanode, tasktracker, and hregionserver? > > > Exactly , we have such process distribution. > we have 16G ordinary machines > and 48G ram for maser , so I am not sure that I understand your > calculation > , please clarify > > If so, then your memory allocation is: > > 4 GB for regionserver > > 1 GB for OS > > 1 GB for datanode > > 1 GB for tasktracker > > 9/6 GB for M/R > > So, are you sure that all of your m/r tasks take less than 1 GB? > > > > Dave > > > > -----Original Message----- > > From: Oleg Ruchovets [mailto:[email protected]] > > Sent: Tuesday, August 23, 2011 2:15 AM > > To: [email protected] > > Subject: how to make tuning for hbase (every couple of days hbase region > > sever/s crashe) > > > > Hi , > > > > Our environment > > hbase 90.2 (10 machine) > > We have 10 machine grid: > > master has 48G ram > > slaves machine has 16G ram. > > Region Server process has 4G ram > > Zookeeper process has 2G ram > > We have 4map/2reducer per machine > > > > > > We write from m/r job to hbase (2 jobs a day). 3 months system works > > without any problem , but now every 3/4 days region server crashes. > > What we done so far: > > 1) We running major compaction manually once a day > > 2) We increases regions size to prevent automatic split. > > > > Question: > > What is the way to make a HBase tuning ? > > How to debug such problem , because it is still not clear for me what > is > > the root cause of region's crashes? > > > > > > > > We started from this post. > > > > > http://search-hadoop.com/m/HDoK22ikTCI/M%252FR+vs+hbase+problem+in+production&subj=M+R+vs+hbase+problem+in+production > > > > > > < > > > http://search-hadoop.com/m/HDoK22ikTCI/M%252FR+vs+hbase+problem+in+production&subj=M+R+vs+hbase+problem+in+production > > > > > Regards > > Oleg. > > >
