[ 
https://issues.apache.org/jira/browse/YARN-2262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14074636#comment-14074636
 ] 

Zhijie Shen commented on YARN-2262:
-----------------------------------

[~nishan], thanks for sharing the logs. I've done a preliminary investigation 
into the RM logs. It seems that the FS history store messed up after RM failed 
over. There're two types of exception:

1. The history file of application_1406035038624_0005 shouldn't occur, because 
before failover, I didn't see application_1406035038624_0005 was already 
started according to the log (or the log is not complete?). However, FS store 
found the history file on HDFS, and wanted to append more information into the 
file, but failed to open the file in append mode.
{code}
2014-07-23 17:00:03,066 ERROR 
org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore:
 Error when openning history file of application application_1406035038624_0005
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException):
 Failed to create file 
[/home/testos/timelinedata/generic-history/ApplicationHistoryDataRoot/application_1406035038624_0005]
 for [DFSClient_NONMAPREDUCE_-903472038_1] for client [10.18.40.84], because 
this file is already being created by [DFSClient_NONMAPREDUCE_1878412866_1] on 
[10.18.40.84]
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:2549)
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInternal(FSNamesystem.java:2378)
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInt(FSNamesystem.java:2613)
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFile(FSNamesystem.java:2576)
        at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.append(NameNodeRpcServer.java:537)
        at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.append(ClientNamenodeProtocolServerSideTranslatorPB.java:373)
        at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)

        at org.apache.hadoop.ipc.Client.call(Client.java:1410)
        at org.apache.hadoop.ipc.Client.call(Client.java:1363)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
        at $Proxy14.append(Unknown Source)
        at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.append(ClientNamenodeProtocolTranslatorPB.java:276)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190)
        at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103)
        at $Proxy15.append(Unknown Source)
        at org.apache.hadoop.hdfs.DFSClient.callAppend(DFSClient.java:1569)
        at org.apache.hadoop.hdfs.DFSClient.append(DFSClient.java:1609)
        at org.apache.hadoop.hdfs.DFSClient.append(DFSClient.java:1597)
        at 
org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:320)
        at 
org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:316)
        at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
        at 
org.apache.hadoop.hdfs.DistributedFileSystem.append(DistributedFileSystem.java:316)
        at org.apache.hadoop.fs.FileSystem.append(FileSystem.java:1161)
        at 
org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore$HistoryFileWriter.<init>(FileSystemApplicationHistoryStore.java:723)
        at 
org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore.applicationStarted(FileSystemApplicationHistoryStore.java:418)
        at 
org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter.handleWritingApplicationHistoryEvent(RMApplicationHistoryWriter.java:140)
        at 
org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter$ForwardingEventHandler.handle(RMApplicationHistoryWriter.java:297)
        at 
org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter$ForwardingEventHandler.handle(RMApplicationHistoryWriter.java:292)
        at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
        at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
        at java.lang.Thread.run(Thread.java:662)
{code}

2. It seems that this exception happened because the file was written by 
another process simultaneously. Perhaps, the previous RM that went to standby 
didn't close the outstanding history files properly.
{code}
2014-07-23 17:00:03,497 ERROR 
org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore:
 Error when openning history file of application application_1406002968974_0004
java.io.IOException: Output file not at zero offset.
        at org.apache.hadoop.io.file.tfile.BCFile$Writer.<init>(BCFile.java:288)
        at org.apache.hadoop.io.file.tfile.TFile$Writer.<init>(TFile.java:288)
        at 
org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore$HistoryFileWriter.<init>(FileSystemApplicationHistoryStore.java:728)
        at 
org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore.applicationStarted(FileSystemApplicationHistoryStore.java:418)
        at 
org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter.handleWritingApplicationHistoryEvent(RMApplicationHistoryWriter.java:140)
        at 
org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter$ForwardingEventHandler.handle(RMApplicationHistoryWriter.java:297)
        at 
org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter$ForwardingEventHandler.handle(RMApplicationHistoryWriter.java:292)
        at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
        at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
        at java.lang.Thread.run(Thread.java:662)
{code}

> Few fields displaying wrong values in Timeline server after RM restart
> ----------------------------------------------------------------------
>
>                 Key: YARN-2262
>                 URL: https://issues.apache.org/jira/browse/YARN-2262
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>    Affects Versions: 2.4.0
>            Reporter: Nishan Shetty
>            Assignee: Naganarasimha G R
>         Attachments: Capture.PNG, Capture1.PNG, 
> yarn-testos-historyserver-HOST-10-18-40-95.log, 
> yarn-testos-resourcemanager-HOST-10-18-40-84.log, 
> yarn-testos-resourcemanager-HOST-10-18-40-95.log
>
>
> Few fields displaying wrong values in Timeline server after RM restart
> State:        null
> FinalStatus:  UNDEFINED
> Started:      8-Jul-2014 14:58:08
> Elapsed:      2562047397789hrs, 44mins, 47sec 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to