How have you implemented the failover ? Also can you attach JTHA logs ? If you 
hav implemented it using. Zkfc, it would be interesting to look in zookeeper 
logs as well. 

Sent from my iPhone

> On Jan 27, 2014, at 3:00 PM, "Karthik Kambatla" <[email protected]> wrote:
> 
> (Redirecting to cdh-user, moving user@hadoop to bcc).
> 
> Hi Oren
> 
> Can you attach slightly longer versions of the log files on both the JTs? 
> Also, if this is something recurring, it would be nice to monitor the JT heap 
> usage and GC timeouts using jstat -gcutil <jt-pid>.
> 
> Thanks
> Karthik
> 
> 
> 
> 
>> On Thu, Jan 23, 2014 at 8:11 AM, Oren Marmor <[email protected]> wrote:
>> Hi.
>> We have two HA Jobtrackers in active/standby mode. (CDH4.2 on ubuntu server)
>> We had a problem during which the active node suddenly became standby and 
>> the standby server attempted to start resulting in a java heap space failure.
>> any ideas to why the active node turned to standby?
>> 
>> logs attached:
>> on (original) active node:
>> 2014-01-22 06:48:41,289 INFO org.apache.hadoop.mapred.JobTracker: 
>> Initializing job_201401041634_5858
>> 2014-01-22 06:48:41,289 INFO org.apache.hadoop.mapred.JobInProgress: 
>> Initializing job_201401041634_5858
>> 2014-01-22 06:50:27,386 INFO 
>> org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Transitioning to 
>> standby
>> 2014-01-22 06:50:27,386 INFO org.apache.hadoop.mapred.JobTracker: Stopping 
>> pluginDispatcher
>> 2014-01-22 06:50:27,386 INFO org.apache.hadoop.mapred.JobTracker: Stopping 
>> infoServer
>> 2014-01-22 06:50:44,093 WARN org.apache.hadoop.ipc.Client: interrupted 
>> waiting to send params to server
>> java.lang.InterruptedException
>>         at 
>> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:979)
>>         at 
>> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1281)
>>         at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:218)
>>         at java.util.concurrent.FutureTask.get(FutureTask.java:83)
>>         at org.apache.hadoop.ipc.Client$Connection.sendParam(Client.java:913)
>>         at org.apache.hadoop.ipc.Client.call(Client.java:1198)
>>         at 
>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>>         at $Proxy9.getFileInfo(Unknown Source)
>>         at 
>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:628)
>>         at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
>>         at 
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>         at java.lang.reflect.Method.invoke(Method.java:597)
>>         at 
>> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
>>         at 
>> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
>>         at $Proxy10.getFileInfo(Unknown Source)
>>         at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1532)
>>         at 
>> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:803)
>>         at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1332)
>>         at 
>> org.apache.hadoop.mapred.JobTrackerHAServiceProtocol$SystemDirectoryMonitor.run(JobTrackerHAServiceProtocol.java:96)
>>         at 
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
>>         at 
>> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
>>         at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
>>         at 
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
>>         at 
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
>>         at 
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
>>         at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>>         at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>>         at java.lang.Thread.run(Thread.java:662)
>> 2014-01-22 06:51:55,637 INFO org.mortbay.log: Stopped 
>> [email protected]:50031
>> 
>> on standby node
>> 2014-01-22 06:50:05,010 INFO 
>> org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Transitioning to active
>> 2014-01-22 06:50:05,010 INFO 
>> org.apache.hadoop.mapred.JobTrackerHAHttpRedirector: Stopping 
>> JobTrackerHAHttpRedirector on port 50030
>> 2014-01-22 06:50:05,098 INFO org.mortbay.log: Stopped 
>> [email protected]:50030
>> 2014-01-22 06:50:05,198 INFO 
>> org.apache.hadoop.mapred.JobTrackerHAHttpRedirector: Stopped
>> 2014-01-22 06:50:05,201 INFO 
>> org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Renaming previous 
>> system directory hdfs://***/tmp/mapred/system/seq-000000000022 to hdfs://t
>> aykey/tmp/mapred/system/seq-000000000023
>> 2014-01-22 06:50:05,244 INFO 
>> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager:
>>  Updating the current master key for generating delegation tokens
>> 2014-01-22 06:50:05,248 INFO org.apache.hadoop.mapred.JobTracker: Scheduler 
>> configured with (memSizeForMapSlotOnJT, memSizeForReduceSlotOnJT, 
>> limitMaxMemForMapTasks, limitMaxMemF
>> orReduceTasks) (-1, -1, -1, -1)
>> 2014-01-22 06:50:05,248 INFO org.apache.hadoop.util.HostsFileReader: 
>> Refreshing hosts (include/exclude) list
>> 2014-01-22 06:50:11,839 INFO 
>> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager:
>>  Starting expired delegation token remover thread, tokenRemoverScanI
>> nterval=60 min(s)
>> ...
>> 2014-01-22 06:52:00,870 INFO org.apache.hadoop.mapred.JobTracker: Starting 
>> RUNNING
>> 2014-01-22 06:52:06,560 INFO 
>> org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Transitioned to active
>> 2014-01-22 06:52:06,560 WARN org.apache.hadoop.ipc.Server: IPC Server 
>> Responder, call org.apache.hadoop.ha.HAServiceProtocol.transitionToActive 
>> from ****:32931: output error
>> 2014-01-22 06:52:06,561 INFO org.apache.hadoop.ipc.Server: IPC Server 
>> handler 0 on 8023 caught an exception
>> java.nio.channels.ClosedChannelException
>>         at 
>> sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:135)
>>         at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:326)
>>         at org.apache.hadoop.ipc.Server.channelWrite(Server.java:2134)
>>         at org.apache.hadoop.ipc.Server.access$2000(Server.java:108)
>>         at 
>> org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:931)
>>         at org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:997)
>>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1741)
>> 2014-01-22 06:52:13,168 WARN org.apache.hadoop.ipc.Server: IPC Server 
>> Responder, call org.apache.hadoop.ha.HAServiceProtocol.getServiceStatus from 
>> ****:60965: output error
>> 2014-01-22 06:52:13,168 INFO org.apache.hadoop.ipc.Server: IPC Server 
>> handler 0 on 8023 caught an exception
>> java.nio.channels.ClosedChannelException
>>         at 
>> sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:135)
>>         at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:326)
>>         at org.apache.hadoop.ipc.Server.channelWrite(Server.java:2134)
>>         at org.apache.hadoop.ipc.Server.access$2000(Server.java:108)
>>         at 
>> org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:931)
>>         at org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:997)
>>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1741)
>> 
>> thanks
>> Oren
> 

Reply via email to