[ https://issues.apache.org/jira/browse/YARN-403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13724186#comment-13724186 ]
rohithsharma commented on YARN-403: ----------------------------------- Hash verification is one authentication process at Reducer/NM for Http request/response i.e fetching map output from NodeManager's. One specific scenario I have observed above exception is bq. When app master is killed after map phase is completed. 2nd app master attempt start reducer phase which intern reducers request for fetching map output. In the above scenario, 1. 1st attemp AM has SecretKey.This key has sent to Node Manager during start of container(Map Task). Node Manager stores in memory i.e *ShuffleHandler.secretManager* which will be removed only after application is finished. After Map Task is completed, app master is killed. 2. When 2nd attempt app master started, new SecretKey is generated. This Secretkey is sent along with startContainer request to NodeManager and Reducer JVM is started by NM. Fetcher at reducer encrypt request url using SecretKey and sends fetch request to Node Manager where Map Task has run (Old SecretKey is present). At NodeManager , hash verification is verified with old 1st attempt app master SecretKey. This verification fails since keys are different. > Node Manager throws java.io.IOException: Verification of the hashReply failed > ----------------------------------------------------------------------------- > > Key: YARN-403 > URL: https://issues.apache.org/jira/browse/YARN-403 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager > Affects Versions: 2.0.2-alpha, 0.23.6 > Reporter: Devaraj K > Assignee: Omkar Vinit Joshi > > {code:xml} > 2013-02-09 22:59:47,490 WARN org.apache.hadoop.mapred.ShuffleHandler: Shuffle > failure > java.io.IOException: Verification of the hashReply failed > at > org.apache.hadoop.mapreduce.security.SecureShuffleUtils.verifyReply(SecureShuffleUtils.java:98) > at > org.apache.hadoop.mapred.ShuffleHandler$Shuffle.verifyRequest(ShuffleHandler.java:436) > at > org.apache.hadoop.mapred.ShuffleHandler$Shuffle.messageReceived(ShuffleHandler.java:383) > at > org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:80) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:545) > at > org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:754) > at > org.jboss.netty.handler.stream.ChunkedWriteHandler.handleUpstream(ChunkedWriteHandler.java:148) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:545) > at > org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:754) > at > org.jboss.netty.handler.codec.http.HttpChunkAggregator.messageReceived(HttpChunkAggregator.java:116) > at > org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:80) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:545) > at > org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:754) > at > org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:302) > at > org.jboss.netty.handler.codec.replay.ReplayingDecoder.unfoldAndfireMessageReceived(ReplayingDecoder.java:522) > at > org.jboss.netty.handler.codec.replay.ReplayingDecoder.callDecode(ReplayingDecoder.java:506) > at > org.jboss.netty.handler.codec.replay.ReplayingDecoder.messageReceived(ReplayingDecoder.java:443) > at > org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:80) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:545) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:540) > at > org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:274) > at > org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:261) > at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:349) > at > org.jboss.netty.channel.socket.nio.NioWorker.processSelectedKeys(NioWorker.java:280) > at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:200) > at > org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) > at > org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:44) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira