Hello,
This is the first time I am sending a query on hbase mailing list. Hopefully
this is the correct group to ask hbase/hadoop related questions.
I am running hbase 0.92, hadoop 2.0 (cdh 4.1.3). Recently, there were some
instability in my dns service and host lookups request failed occasionally.
During such failures, some random region server will shut itself down when it
encounters a fatal exception during log roll operation. DNS issue was
eventually resolved and region server fatals stopped.
While I was trying to understand the hbase/hadoop behavior during network
events/blips, I found there is a default retry policy used -
TRY_ONCE_THEN_FAIL. Please correct me if thats not the case.
But then I was thinking that there could be more of these blips during network
or some other infrastructure maintenance operations. These maintenance
operations should not result in region server going down. If the client simply
attempts one more time, host lookup request should succeed.
If someone has any similar experience, can they please share? Are there options
one can try out against such failures?
May be I am not thinking in the right direction, but this behavior makes me
feel that hbase (using hdfs) is sensitive to DNS service availability. DNS
unavailability for even few seconds can bring down the entire cluster (rare
chance if all attempt to roll hlogs at the same time).
Here is the stack trace:
11:14:48.706 AM
2014-08-17 11:14:48,706 FATAL
org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server
hadoop0104111601,60020,1408273008941: IOE in log roller
java.io.IOException: cannot get log writer
at
org.apache.hadoop.hbase.regionserver.wal.HLog.createWriter(HLog.java:716)
at
org.apache.hadoop.hbase.regionserver.wal.HLog.createWriterInstance(HLog.java:663)
at
org.apache.hadoop.hbase.regionserver.wal.HLog.rollWriter(HLog.java:595)
at org.apache.hadoop.hbase.regionserver.LogRoller.run(LogRoller.java:94)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.IOException: java.lang.RuntimeException:
java.lang.reflect.InvocationTargetException
at
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.init(SequenceFileLogWriter.java:106)
at
org.apache.hadoop.hbase.regionserver.wal.HLog.createWriter(HLog.java:713)
... 4 more
Caused by: java.lang.RuntimeException:
java.lang.reflect.InvocationTargetException
at
org.apache.hadoop.fs.AbstractFileSystem.newInstance(AbstractFileSystem.java:122)
at
org.apache.hadoop.fs.AbstractFileSystem.createFileSystem(AbstractFileSystem.java:148)
at
org.apache.hadoop.fs.AbstractFileSystem.get(AbstractFileSystem.java:233)
at org.apache.hadoop.fs.FileContext$2.run(FileContext.java:321)
at org.apache.hadoop.fs.FileContext$2.run(FileContext.java:319)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
at
org.apache.hadoop.fs.FileContext.getAbstractFileSystem(FileContext.java:319)
at org.apache.hadoop.fs.FileContext.getFileContext(FileContext.java:432)
at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:469)
at sun.reflect.GeneratedMethodAccessor38.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.init(SequenceFileLogWriter.java:87)
... 5 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.GeneratedConstructorAccessor20.newInstance(Unknown
Source)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at
org.apache.hadoop.fs.AbstractFileSystem.newInstance(AbstractFileSystem.java:120)
... 19 more
Caused by: java.lang.IllegalArgumentException: java.net.UnknownHostException:
hadoop0104111601
at
org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:414)
at
org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:164)
at
org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider.getProxy(ConfiguredFailoverProxyProvider.java:125)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.<init>(RetryInvocationHandler.java:60)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.<init>(RetryInvocationHandler.java:51)
at org.apache.hadoop.io.retry.RetryProxy.create(RetryProxy.java:58)
at
org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:137)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:389)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:356)
at org.apache.hadoop.fs.Hdfs.<init>(Hdfs.java:84)
... 23 more
Caused by: java.net.UnknownHostException: hadoop0104111601
... 33 more
thanks,
Arun