>From the log you posted on pastebin, I see the following. Can you check namenode log to see what went wrong ?
1. Caused by: org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on /hbase/.logs/smartdeals-hbase14-snc1.snc1,60020,1376944419197/smartdeals-hbase14-snc1.snc1%2C60020%2C1376944419197.1377699297514 File does not exist. [Lease. Holder: DFSClient_hb_rs_smartdeals-hbase14-snc1.snc1,60020,1376944419197_-413917755_25, pendingcreates: 1] On Wed, Aug 28, 2013 at 8:00 AM, Ameya Kanitkar <[email protected]> wrote: > HI All, > > We have a very heavy map reduce job that goes over entire table with over > 1TB+ data in HBase and exports all data (Similar to Export job but with > some additional custom code built in) to HDFS. > > However this job is not very stable, and often times we get following error > and job fails: > > org.apache.hadoop.hbase.regionserver.LeaseException: > org.apache.hadoop.hbase.regionserver.LeaseException: lease > '-4456594242606811626' does not exist > at > org.apache.hadoop.hbase.regionserver.Leases.removeLease(Leases.java:231) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:2429) > at sun.reflect.GeneratedMethodAccessor42.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at > org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364) > at > org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1400) > > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native > Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) > at java.lang.reflect.Constructor.newInstance(Constructor. > > > Here are more detailed logs on the RS: http://pastebin.com/xaHF4ksb > > We have changed following settings in HBase to counter this problem > but issue persists: > > <property> > <!-- Loaded from hbase-site.xml --> > <name>hbase.regionserver.lease.period</name> > <value>900000</value> > </property> > > <property> > <!-- Loaded from hbase-site.xml --> > <name>hbase.rpc.timeout</name> > <value>900000</value> > </property> > > > We also reduced number of mappers per RS less than available CPU's on the > box. > > We also observed that problem once happens, happens multiple times on > the same RS. All other regions are unaffected. But different RS > observes this problem on different days. There is no particular region > causing this either. > > We are running: 0.94.2 with cdh4.2.0 > > Any ideas? > > > Ameya >
