Very reasonable scenario, but the application I run does not delete the input files, so such a race condition could not manifest itself at any point.
Funnily enough, experimenting around we have changed some local path permissions and it seems to work now. Thanks! :-) On Tue, May 14, 2013 at 8:39 PM, Chris Nauroth <[email protected]>wrote: > Is it possible that you have multiple MR jobs (or other HDFS clients) > operating on the same file paths that could cause a conflict if run > concurrently? > > At MR job submission time, the MR client identifies the set of input > splits, which roughly correspond to the the blocks of the input HDFS files. > (This is a simplified description, because CombineFileInputFormat or your > own custom InputFormat can complicate the picture, but this simplification > is fine for our purposes.) When map tasks launch, they read from the input > splits (the HDFS file blocks). If you have an MR job that decides once of > its input splits needs block X, and then another process decides to delete > the HDFS file containing block X before the map task that would read the > block launches, then you'd have a race condition that could trigger a > problem similar to this. > > Typically, the solution is to design applications such that concurrent > deletes while reading from a particular HDFS file are not possible. For > example, you might code file deletion after the MR job that consumes those > files, so that you know nothing else is reading while you're trying to > delete. > > BlockMissingException could also show up if you've lost all replicas of a > block, but this would be extremely rare for a typical deployment with a > replication factor of 3. > > Hope this helps, > > Chris Nauroth > Hortonworks > http://hortonworks.com/ > > > > On Tue, May 14, 2013 at 2:20 PM, Public Network Services < > [email protected]> wrote: > >> Hi... >> >> I am getting a BlockMissingException in a fairly simple application with >> a few mappers and reducers (see end of message). >> >> Looking around in the web has not helped much, including JIRA issues >> HDFS-767 and HDFS-1907. The configuration variable >> >> - dfs.client.baseTimeWindow.waitOn.BlockMissingException >> >> does not seem to make a difference, either. >> >> The BlockMissingException occurs in some of the runs, while in others >> execution completes normally, which signifies a possible concurrency issue. >> >> Any ideas? >> >> Thanks! >> >> >> org.apache.hadoop.yarn.YarnException: >> org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: >> BP-390546703... file=...job.splitmetainfo >> at >> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.createSplits(JobImpl.java:1159) >> at >> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.transition(JobImpl.java:1013) >> at >> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.transition(JobImpl.java:985) >> at >> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:380) >> at >> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:298) >> at >> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43) >> at >> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443) >> at >> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:694) >> at >> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:119) >> at >> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:904) >> at >> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.start(MRAppMaster.java:854) >> at >> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$1.run(MRAppMaster.java:1070) >> at java.security.AccessController.doPrivileged(Native Method) >> at javax.security.auth.Subject.doAs(Subject.java:396) >> at >> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1367) >> at >> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1066) >> at >> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1025) >> >> >> >
