LeaseExpiredException during TestDFSIO on HDFS

Robert Schmidtke Wed, 11 Nov 2015 00:05:09 -0800

Hi everyone,

I've been running the TestDFSIO benchmark on HDFS using the following
setup: 8 nodes, (1 namenode with co-located resource manager, 7 data nodes
with co-located node managers), HDFS block size of 32M, replication of 1,
21 files of 1G each (i.e. 3 mappers per data node). I am running TestDFSIO
ten times in a row (as a cycle of write, read and cleanup operations), and
in some of the runs I'm getting a LeaseExpiredException (not the first run
though). Following is a stack trace with some context. I was hoping that
maybe you could point me to where I might have gone wrong in my
configuration. My HDFS config files are pretty vanilla, I am using Hadoop
2.7.1.


...
15/11/10 11:44:15 INFO mapreduce.Job: Running job: job_1447152143064_0003
15/11/10 11:44:21 INFO mapreduce.Job: Job job_1447152143064_0003 running in
uber mode : false
15/11/10 11:44:21 INFO mapreduce.Job:  map 0% reduce 0%
15/11/10 11:44:27 INFO mapreduce.Job:  map 5% reduce 0%
15/11/10 11:44:28 INFO mapreduce.Job:  map 38% reduce 0%
15/11/10 11:44:29 INFO mapreduce.Job:  map 48% reduce 0%
15/11/10 11:44:30 INFO mapreduce.Job:  map 57% reduce 0%
15/11/10 11:44:35 INFO mapreduce.Job:  map 73% reduce 0%
15/11/10 11:44:37 INFO mapreduce.Job:  map 86% reduce 0%
15/11/10 11:44:38 INFO mapreduce.Job:  map 86% reduce 19%
15/11/10 11:44:47 INFO mapreduce.Job: Task Id :
attempt_1447152143064_0003_m_000008_0, Status : FAILED
Error:
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
No lease on /benchmarks/TestDFSIO/io_data/test_io_18 (inode 16554): File
does not exist. Holder
DFSClient_attempt_1447152143064_0003_m_000008_0_690388761_1 does not have
any open files.
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:3431)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNamesystem.java:3236)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getNewBlockTargets(FSNamesystem.java:3074)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3034)
at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:723)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:492)
at
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)

at org.apache.hadoop.ipc.Client.call(Client.java:1476)
at org.apache.hadoop.ipc.Client.call(Client.java:1407)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
at com.sun.proxy.$Proxy12.addBlock(Unknown Source)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:418)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy13.addBlock(Unknown Source)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1430)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1226)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:449)

15/11/10 11:44:48 INFO mapreduce.Job:  map 83% reduce 19%
15/11/10 11:44:50 INFO mapreduce.Job:  map 89% reduce 22%
15/11/10 11:44:51 INFO mapreduce.Job:  map 100% reduce 22%
15/11/10 11:44:52 INFO mapreduce.Job:  map 100% reduce 100%
15/11/10 11:44:53 INFO mapreduce.Job: Job job_1447152143064_0003 completed
successfully
15/11/10 11:44:53 INFO mapreduce.Job: Counters: 51
...

I am also seeing an extremely high standard deviation for the read rate (up
to almost 100%), as well as running times for read operations (between 20s
and 160s). The locality of the placement is also roughly only 15 out of 21.
Could this be related to the above exception(s)? Thanks a lot in advance,
I'm happy to supply any more information if you need it.

Robert

-- 
My GPG Key ID: 336E2680

LeaseExpiredException during TestDFSIO on HDFS

Reply via email to