Siddhi Mehta created YARN-3578: ---------------------------------- Summary: HistoryFileManager.scanDirectory() should check if the dateString path exists else it throw FileNotFoundException Key: YARN-3578 URL: https://issues.apache.org/jira/browse/YARN-3578 Project: Hadoop YARN Issue Type: Bug Components: yarn Affects Versions: 2.3.0 Reporter: Siddhi Mehta
When the job client tries to access counters for a recently completed job. Here is what I think is happening. 1. The job in question started an completed on 05/02/2015. So ideally the history file location should be /mapred/history/done/2015/05/02/{000002}/ 2. But instead HistoryFileManager looks at directory /mapred/history/done/2015/04/02/{000002}/ and fails Looking at the logic in {code}org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.scanOldDirsForJob(JobId) {code} of how the idtoDateString cache is created looks like the key is independent of the RM start time, So if you had 2 jobs job_RMstarttime1_0001 and job_RMstarttime2_0001, the idtoDateString cache will have the following entries 000000001 -> { job_RMstarttime1_0001historydir, job_RMstarttime2_0001historyDir). 3. If job_RMstarttime1_0001 is older than "mapreduce.jobhistory.max-age-ms" we delete the history info from HDFS. 4. For job_RMstarttime2_0001historyDir when we try and query it fails with a filenotFoundException. Either the keys should be aware of RM starttime or before HistoryFileManager.scanDirectory does a list status it should check if the path exists to avoid file not found exception. {code} private static List<FileStatus> scanDirectory(Path path, FileContext fc, PathFilter pathFilter) throws IOException { path = fc.makeQualified(path); List<FileStatus> jhStatusList = new ArrayList<FileStatus>(); if(!fc.exists(path)) { return jhStatusList } RemoteIterator<FileStatus> fileStatusIter = fc.listStatus(path); while (fileStatusIter.hasNext()) { FileStatus fileStatus = fileStatusIter.next(); Path filePath = fileStatus.getPath(); if (fileStatus.isFile() && pathFilter.accept(filePath)) { jhStatusList.add(fileStatus); } } return jhStatusList; } {code} Complete stack trace: gslog`20150504141445.816``263424`0`0````````1892499996-10858515`754671855`/ex/UnhandledException.jsp`JAVA.FileNotFoundException -> java.io.IOException: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.yarn.exceptions.YarnRuntimeException): java.io.FileNotFoundException: File /mapred/history/done/2015/04/02/000002 does not exist. at org.apache.hadoop.mapreduce.v2.hs.CachedHistoryStorage.getFullJob(CachedHistoryStorage.java:147) at org.apache.hadoop.mapreduce.v2.hs.JobHistory.getJob(JobHistory.java:217) at org.apache.hadoop.mapreduce.v2.hs.HistoryClientService$HSClientProtocolHandler$1.run(HistoryClientService.java:203) at org.apache.hadoop.mapreduce.v2.hs.HistoryClientService$HSClientProtocolHandler$1.run(HistoryClientService.java:199) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.mapreduce.v2.hs.HistoryClientService$HSClientProtocolHandler.verifyAndGetJob(HistoryClientService.java:199) at org.apache.hadoop.mapreduce.v2.hs.HistoryClientService$HSClientProtocolHandler.getJobReport(HistoryClientService.java:231) at org.apache.hadoop.mapreduce.v2.api.impl.pb.service.MRClientProtocolPBServiceImpl.getJobReport(MRClientProtocolPBServiceImpl.java:122) at org.apache.hadoop.yarn.proto.MRClientProtocol$MRClientProtocolService$2.callBlockingMethod(MRClientProtocol.java:275) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1986) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1982) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1980) Caused by: java.io.FileNotFoundException: File /mapred/history/done/2015/04/02/000002 does not exist. at org.apache.hadoop.fs.Hdfs$DirListingIterator.<init>(Hdfs.java:205) at org.apache.hadoop.fs.Hdfs$DirListingIterator.<init>(Hdfs.java:189) at org.apache.hadoop.fs.Hdfs$2.<init>(Hdfs.java:171) at org.apache.hadoop.fs.Hdfs.listStatusIterator(Hdfs.java:171) at org.apache.hadoop.fs.FileContext$20.next(FileContext.java:1392) at org.apache.hadoop.fs.FileContext$20.next(FileContext.java:1387) at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90) at org.apache.hadoop.fs.FileContext.listStatus(FileContext.java:1387) at org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.scanDirectory(HistoryFileManager.java:739) at org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.scanDirectoryForHistoryFiles(HistoryFileManager.java:752) at org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.scanOldDirsForJob(HistoryFileManager.java:909) at org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.getFileInfo(HistoryFileManager.java:938) at org.apache.hadoop.mapreduce.v2.hs.CachedHistoryStorage.getFullJob(CachedHistoryStorage.java:132) ... 18 more at org.apache.hadoop.mapred.ClientServiceDelegate.invoke(ClientServiceDelegate.java:334)`hadoop.client.NetworkedHadoopJobClientImpl`getCounters`RATEOF-BLOCK Date: Mon May 04 14:14:45 GMT 2015 Hostname: gs0-app2-1-chi.ops.sfdc.net SettingsPath: prod.-.chi.gs0.app JavaVersion: 1.7.0_76 UniqueId: 1892499996-10858515 StackTraceId: 754671855 Allowed: 865202 Blocked: 3 ThreadId: 263424 ThreadName: /ex/UnhandledException.jsp Category: JAVA.FileNotFoundException SubCategory: BaseSfdcGack SourceClassName: hadoop.client.NetworkedHadoopJobClientImpl SourceMethodName: getCounters OrganizationId: null UserId: null Subject: java.io.IOException: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.yarn.exceptions.YarnRuntimeException): java.io.FileNotFoundException: File /mapred/history/done/2015/04/02/000002 does not exist. at org.apache.hadoop.mapreduce.v2.hs.CachedHistoryStorage.getFullJob(CachedHistoryStorage.java:147) at org.apache.hadoop.mapreduce.v2.hs.JobHistory.getJob(JobHistory.java:217) at org.apache.hadoop.mapreduce.v2.hs.HistoryClientService$HSClientProtocolHandler$1.run(HistoryClientService.java:203) at org.apache.hadoop.mapreduce.v2.hs.HistoryClientService$HSClientProtocolHandler$1.run(HistoryClientService.java:199) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.mapreduce.v2.hs.HistoryClientService$HSClientProtocolHandler.verifyAndGetJob(HistoryClientService.java:199) at org.apache.hadoop.mapreduce.v2.hs.HistoryClientService$HSClientProtocolHandler.getJobReport(HistoryClientService.java:231) at org.apache.hadoop.mapreduce.v2.api.impl.pb.service.MRClientProtocolPBServiceImpl.getJobReport(MRClientProtocolPBServiceImpl.java:122) at org.apache.hadoop.yarn.proto.MRClientProtocol$MRClientProtocolService$2.callBlockingMethod(MRClientProtocol.java:275) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1986) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1982) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1980) Caused by: java.io.FileNotFoundException: File /mapred/history/done/2015/04/02/000002 does not exist. at org.apache.hadoop.fs.Hdfs$DirListingIterator.<init>(Hdfs.java:205) at org.apache.hadoop.fs.Hdfs$DirListingIterator.<init>(Hdfs.java:189) at org.apache.hadoop.fs.Hdfs$2.<init>(Hdfs.java:171) at org.apache.hadoop.fs.Hdfs.listStatusIterator(Hdfs.java:171) at org.apache.hadoop.fs.FileContext$20.next(FileContext.java:1392) at org.apache.hadoop.fs.FileContext$20.next(FileContext.java:1387) at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90) at org.apache.hadoop.fs.FileContext.listStatus(FileContext.java:1387) at org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.scanDirectory(HistoryFileManager.java:739) at org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.scanDirectoryForHistoryFiles(HistoryFileManager.java:752) at org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.scanOldDirsForJob(HistoryFileManager.java:909) at org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.getFileInfo(HistoryFileManager.java:938) at org.apache.hadoop.mapreduce.v2.hs.CachedHistoryStorage.getFullJob(CachedHistoryStorage.java:132) ... 18 more ExtendedMessage: null Instance: null Level: SEVERE Thrown: hadoop.client.HadoopClientException: java.io.IOException: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.yarn.exceptions.YarnRuntimeException): java.io.FileNotFoundException: File /mapred/history/done/2015/04/02/000002 does not exist. at org.apache.hadoop.mapreduce.v2.hs.CachedHistoryStorage.getFullJob(CachedHistoryStorage.java:147) at org.apache.hadoop.mapreduce.v2.hs.JobHistory.getJob(JobHistory.java:217) at org.apache.hadoop.mapreduce.v2.hs.HistoryClientService$HSClientProtocolHandler$1.run(HistoryClientService.java:203) at org.apache.hadoop.mapreduce.v2.hs.HistoryClientService$HSClientProtocolHandler$1.run(HistoryClientService.java:199) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.mapreduce.v2.hs.HistoryClientService$HSClientProtocolHandler.verifyAndGetJob(HistoryClientService.java:199) at org.apache.hadoop.mapreduce.v2.hs.HistoryClientService$HSClientProtocolHandler.getJobReport(HistoryClientService.java:231) at org.apache.hadoop.mapreduce.v2.api.impl.pb.service.MRClientProtocolPBServiceImpl.getJobReport(MRClientProtocolPBServiceImpl.java:122) at org.apache.hadoop.yarn.proto.MRClientProtocol$MRClientProtocolService$2.callBlockingMethod(MRClientProtocol.java:275) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1986) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1982) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1980) Caused by: java.io.FileNotFoundException: File /mapred/history/done/2015/04/02/000002 does not exist. at org.apache.hadoop.fs.Hdfs$DirListingIterator.<init>(Hdfs.java:205) at org.apache.hadoop.fs.Hdfs$DirListingIterator.<init>(Hdfs.java:189) at org.apache.hadoop.fs.Hdfs$2.<init>(Hdfs.java:171) at org.apache.hadoop.fs.Hdfs.listStatusIterator(Hdfs.java:171) at org.apache.hadoop.fs.FileContext$20.next(FileContext.java:1392) at org.apache.hadoop.fs.FileContext$20.next(FileContext.java:1387) at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90) at org.apache.hadoop.fs.FileContext.listStatus(FileContext.java:1387) at org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.scanDirectory(HistoryFileManager.java:739) at org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.scanDirectoryForHistoryFiles(HistoryFileManager.java:752) at org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.scanOldDirsForJob(HistoryFileManager.java:909) at org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.getFileInfo(HistoryFileManager.java:938) at org.apache.hadoop.mapreduce.v2.hs.CachedHistoryStorage.getFullJob(CachedHistoryStorage.java:132) ... 18 more Cause0: java.io.IOException: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.yarn.exceptions.YarnRuntimeException): java.io.FileNotFoundException: File /mapred/history/done/2015/04/02/000002 does not exist. at org.apache.hadoop.mapreduce.v2.hs.CachedHistoryStorage.getFullJob(CachedHistoryStorage.java:147) at org.apache.hadoop.mapreduce.v2.hs.JobHistory.getJob(JobHistory.java:217) at org.apache.hadoop.mapreduce.v2.hs.HistoryClientService$HSClientProtocolHandler$1.run(HistoryClientService.java:203) at org.apache.hadoop.mapreduce.v2.hs.HistoryClientService$HSClientProtocolHandler$1.run(HistoryClientService.java:199) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.mapreduce.v2.hs.HistoryClientService$HSClientProtocolHandler.verifyAndGetJob(HistoryClientService.java:199) at org.apache.hadoop.mapreduce.v2.hs.HistoryClientService$HSClientProtocolHandler.getJobReport(HistoryClientService.java:231) at org.apache.hadoop.mapreduce.v2.api.impl.pb.service.MRClientProtocolPBServiceImpl.getJobReport(MRClientProtocolPBServiceImpl.java:122) at org.apache.hadoop.yarn.proto.MRClientProtocol$MRClientProtocolService$2.callBlockingMethod(MRClientProtocol.java:275) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1986) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1982) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1980) Caused by: java.io.FileNotFoundException: File /mapred/history/done/2015/04/02/000002 does not exist. at org.apache.hadoop.fs.Hdfs$DirListingIterator.<init>(Hdfs.java:205) at org.apache.hadoop.fs.Hdfs$DirListingIterator.<init>(Hdfs.java:189) at org.apache.hadoop.fs.Hdfs$2.<init>(Hdfs.java:171) at org.apache.hadoop.fs.Hdfs.listStatusIterator(Hdfs.java:171) at org.apache.hadoop.fs.FileContext$20.next(FileContext.java:1392) at org.apache.hadoop.fs.FileContext$20.next(FileContext.java:1387) at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90) at org.apache.hadoop.fs.FileContext.listStatus(FileContext.java:1387) at org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.scanDirectory(HistoryFileManager.java:739) at org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.scanDirectoryForHistoryFiles(HistoryFileManager.java:752) at org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.scanOldDirsForJob(HistoryFileManager.java:909) at org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.getFileInfo(HistoryFileManager.java:938) at org.apache.hadoop.mapreduce.v2.hs.CachedHistoryStorage.getFullJob(CachedHistoryStorage.java:132) ... 18 more Cause0-StackTrace: at org.apache.hadoop.mapred.ClientServiceDelegate.invoke(ClientServiceDelegate.java:334) at org.apache.hadoop.mapred.ClientServiceDelegate.getJobStatus(ClientServiceDelegate.java:419) at org.apache.hadoop.mapred.YARNRunner.getJobStatus(YARNRunner.java:532) at org.apache.hadoop.mapreduce.Cluster.getJob(Cluster.java:183) at hadoop.client.NetworkedHadoopJobClientImpl.getJob(NetworkedHadoopJobClientImpl.java:133) at hadoop.client.NetworkedHadoopJobClientImpl.getCounters(NetworkedHadoopJobClientImpl.java:112) ... 110 shared with parent Cause1: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.yarn.exceptions.YarnRuntimeException): java.io.FileNotFoundException: File /mapred/history/done/2015/04/02/000002 does not exist. at org.apache.hadoop.mapreduce.v2.hs.CachedHistoryStorage.getFullJob(CachedHistoryStorage.java:147) at org.apache.hadoop.mapreduce.v2.hs.JobHistory.getJob(JobHistory.java:217) at org.apache.hadoop.mapreduce.v2.hs.HistoryClientService$HSClientProtocolHandler$1.run(HistoryClientService.java:203) at org.apache.hadoop.mapreduce.v2.hs.HistoryClientService$HSClientProtocolHandler$1.run(HistoryClientService.java:199) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.mapreduce.v2.hs.HistoryClientService$HSClientProtocolHandler.verifyAndGetJob(HistoryClientService.java:199) at org.apache.hadoop.mapreduce.v2.hs.HistoryClientService$HSClientProtocolHandler.getJobReport(HistoryClientService.java:231) at org.apache.hadoop.mapreduce.v2.api.impl.pb.service.MRClientProtocolPBServiceImpl.getJobReport(MRClientProtocolPBServiceImpl.java:122) at org.apache.hadoop.yarn.proto.MRClientProtocol$MRClientProtocolService$2.callBlockingMethod(MRClientProtocol.java:275) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1986) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1982) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1980) Caused by: java.io.FileNotFoundException: File /mapred/history/done/2015/04/02/000002 does not exist. at org.apache.hadoop.fs.Hdfs$DirListingIterator.<init>(Hdfs.java:205) at org.apache.hadoop.fs.Hdfs$DirListingIterator.<init>(Hdfs.java:189) at org.apache.hadoop.fs.Hdfs$2.<init>(Hdfs.java:171) at org.apache.hadoop.fs.Hdfs.listStatusIterator(Hdfs.java:171) at org.apache.hadoop.fs.FileContext$20.next(FileContext.java:1392) at org.apache.hadoop.fs.FileContext$20.next(FileContext.java:1387) at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90) at org.apache.hadoop.fs.FileContext.listStatus(FileContext.java:1387) at org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.scanDirectory(HistoryFileManager.java:739) at org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.scanDirectoryForHistoryFiles(HistoryFileManager.java:752) at org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.scanOldDirsForJob(HistoryFileManager.java:909) at org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.getFileInfo(HistoryFileManager.java:938) at org.apache.hadoop.mapreduce.v2.hs.CachedHistoryStorage.getFullJob(CachedHistoryStorage.java:132) ... 18 more Cause1-StackTrace: at org.apache.hadoop.ipc.Client.call(Client.java:1409) at org.apache.hadoop.ipc.Client.call(Client.java:1362) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) at com.sun.proxy.$Proxy328.getJobReport(Unknown Source) at org.apache.hadoop.mapreduce.v2.api.impl.pb.client.MRClientProtocolPBClientImpl.getJobReport(MRClientProtocolPBClientImpl.java:133) at sun.reflect.GeneratedMethodAccessor4114.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.mapred.ClientServiceDelegate.invoke(ClientServiceDelegate.java:320) ... 115 shared with parent Option-RacNode: 2 RequestParameters: jobStatus=SUCCEEDED assignedHadoopJobId=job_1428500471449_2445 -- This message was sent by Atlassian JIRA (v6.3.4#6332)