The standby warning is because it try to connect to the standby node. (HA
should be enabled I think).  Maybe this cause the output missing, could you
attach the whole app log ?

On Thu, Feb 25, 2016 at 5:03 PM, Johannes Zillmann <[email protected]
> wrote:

> Hey guys,
>
> have a Tez job on a customer deployment which has very strange symptoms.
> Maybe you have some pointers for me what to look into!
>
> App master log:
> —————————————
>
> 2016-02-16 12:05:16,666 INFO [main] app.DAGAppMaster: Created DAGAppMaster 
> for application appattempt_1455209830330_35852_000001, versionInfo=[ 
> component=tez-dag, version=0.6.0, 
> revision=25e41bc0f29f5e25adbfc17258b00ad702a17888, 
> SCM-URL=scm:git:https://git-wip-us.apache.org/repos/asf/tez.git, 
> buildTime=2015-03-03T18:01:25Z ]
> 2016-02-16 12:05:16,807 INFO [main] Configuration.deprecation: 
> fs.default.name is deprecated. Instead, use fs.defaultFS
> 2016-02-16 12:05:16,816 INFO [main] Configuration.deprecation: dfs.umaskmode 
> is deprecated. Instead, use fs.permissions.umask-mode
> 2016-02-16 12:05:16,987 WARN [main] util.NativeCodeLoader: Unable to load 
> native-hadoop library for your platform... using builtin-java classes where 
> applicable
> 2016-02-16 12:05:17,134 INFO [main] app.DAGAppMaster: Comparing client 
> version with AM version, clientVersion=0.6.0, AMVersion=0.6.0
> 2016-02-16 12:05:18,158 WARN [main] ipc.Client: Exception encountered while 
> connecting to the server : 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException):
>  Operation category READ is not supported in state standby
> 2016-02-16 12:05:18,298 INFO [main] app.DAGAppMaster: Adding session token to 
> jobTokenSecretManager for application
> 2016-02-16 12:05:18,305 INFO [main] event.AsyncDispatcher: Registering class 
> org.apache.tez.dag.app.rm.container.AMContainerEventType for class 
> org.apache.tez.dag.app.rm.container.AMContainerMap
> 2016-02-16 12:05:18,306 INFO [main] event.AsyncDispatcher: Registering class 
> org.apache.tez.dag.app.rm.node.AMNodeEventType for class 
> org.apache.tez.dag.app.rm.node.AMNodeTracker
> 2016-02-16 12:05:18,308 INFO [main] event.AsyncDispatcher: Registering class 
> org.apache.tez.dag.app.dag.event.DAGAppMasterEventType for class 
> org.apache.tez.dag.app.DAGAppMaster$DAGAppMasterEventHandler
> 2016-02-16 12:05:18,309 INFO [main] event.AsyncDispatcher: Registering class 
> org.apache.tez.dag.app.dag.event.DAGEventType for class 
> org.apache.tez.dag.app.DAGAppMaster$DagEventDispatcher
> 2016-02-16 12:05:18,310 INFO [main] event.AsyncDispatcher: Registering class 
> org.apache.tez.dag.app.dag.event.VertexEventType for class 
> org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher
> 2016-02-16 12:05:18,311 INFO [main] event.AsyncDispatcher: Registering class 
> org.apache.tez.dag.app.dag.event.TaskEventType for class 
> org.apache.tez.dag.app.DAGAppMaster$TaskEventDispatcher
> 2016-02-16 12:05:18,312 INFO [main] event.AsyncDispatcher: Registering class 
> org.apache.tez.dag.app.dag.event.TaskAttemptEventType for class 
> org.apache.tez.dag.app.DAGAppMaster$TaskAttemptEventDispatcher
> 2016-02-16 12:05:18,312 INFO [main] event.AsyncDispatcher: Registering class 
> org.apache.tez.dag.app.dag.event.SpeculatorEventType for class 
> org.apache.tez.dag.app.DAGAppMaster$4
> 2016-02-16 12:05:18,312 INFO [main] event.AsyncDispatcher: Registering class 
> org.apache.tez.dag.app.dag.event.SpeculatorEventType for class 
> org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler
> 2016-02-16 12:05:18,357 INFO [main] event.AsyncDispatcher: Registering class 
> org.apache.tez.dag.app.rm.AMSchedulerEventType for class 
> org.apache.tez.dag.app.rm.TaskSchedulerEventHandler
> 2016-02-16 12:05:18,359 INFO [main] event.AsyncDispatcher: Registering class 
> org.apache.tez.dag.app.rm.NMCommunicatorEventType for class 
> org.apache.tez.dag.app.launcher.ContainerLauncherImpl
> 2016-02-16 12:05:18,483 INFO [main] node.AMNodeTracker: 
> blacklistDisablePercent is 33, blacklistingEnabled: true, 
> maxTaskFailuresPerNode: 10
> 2016-02-16 12:05:18,485 INFO [main] launcher.ContainerLauncherImpl: Upper 
> limit on the thread pool size is 500
> 2016-02-16 12:05:18,485 INFO [main] history.HistoryEventHandler: Initializing 
> HistoryEventHandler
> 2016-02-16 12:05:18,489 INFO [main] impl.SimpleHistoryLoggingService: Log 
> file location for SimpleHistoryLoggingService not specified, defaulting to 
> containerLogDir=/u10/hadoop/yarn/log/application_1455209830330_35852/container_1455209830330_35852_01_000001
> 2016-02-16 12:05:18,490 INFO [main] impl.SimpleHistoryLoggingService: 
> Initializing SimpleHistoryLoggingService, 
> logFileLocation=/u10/hadoop/yarn/log/application_1455209830330_35852/container_1455209830330_35852_01_000001/history.txt.appattempt_1455209830330_35852_000001,
>  maxErrors=10
> 2016-02-16 12:05:18,490 INFO [main] recovery.RecoveryService: Initializing 
> RecoveryService
> 2016-02-16 12:05:18,491 INFO [main] history.HistoryEventHandler: 
> [HISTORY][DAG:N/A][Event:APP_LAUNCHED]: 
> applicationId=application_1455209830330_35852, appSubmitTime=1455642293317, 
> launchTime=1455642316658
> 2016-02-16 12:05:18,492 INFO [main] history.HistoryEventHandler: 
> [HISTORY][DAG:N/A][Event:AM_LAUNCHED]: 
> appAttemptId=appattempt_1455209830330_35852_000001, 
> appSubmitTime=1455642293317,
>
>
> … user specific code
>
> 2016-02-16 12:05:19,343 INFO [IPC Server handler 0 on 13239] 
> app.DAGAppMaster: Starting DAG submitted via RPC: Export job (349): 
> ClaimLineExportToDB2#export(Disconnected record stream)#export(Disconnected 
> record (d4e9fda0-0f6c-4e63-8d5b-e846334affdf)
> 2016-02-16 12:05:19,347 INFO [IPC Server handler 0 on 13239] 
> Configuration.deprecation: mapred.output.compress is deprecated. Instead, use 
> mapreduce.output.fileoutputformat.compress
> 2016-02-16 12:05:19,347 INFO [IPC Server handler 0 on 13239] 
> Configuration.deprecation: fs.default.name is deprecated. Instead, use 
> fs.defaultFS
> 2016-02-16 12:05:19,348 INFO [IPC Server handler 0 on 13239] 
> Configuration.deprecation: mapred.map.tasks.speculative.execution is 
> deprecated. Instead, use mapreduce.map.speculative
> 2016-02-16 12:05:19,348 INFO [IPC Server handler 0 on 13239] 
> Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use 
> mapreduce.job.reduces
> 2016-02-16 12:05:19,349 INFO [IPC Server handler 0 on 13239] 
> Configuration.deprecation: mapred.min.split.size is deprecated. Instead, use 
> mapreduce.input.fileinputformat.split.minsize
> 2016-02-16 12:05:19,349 INFO [IPC Server handler 0 on 13239] 
> Configuration.deprecation: mapred.reduce.tasks.speculative.execution is 
> deprecated. Instead, use mapreduce.reduce.speculative
> 2016-02-16 12:05:19,351 INFO [IPC Server handler 0 on 13239] 
> Configuration.deprecation: mapred.output.compression.type is deprecated. 
> Instead, use mapreduce.output.fileoutputformat.compress.type
> 2016-02-16 12:05:19,352 INFO [IPC Server handler 0 on 13239] 
> Configuration.deprecation: mapred.max.split.size is deprecated. Instead, use 
> mapreduce.input.fileinputformat.split.maxsize
> 2016-02-16 12:05:19,354 INFO [IPC Server handler 0 on 13239] 
> Configuration.deprecation: mapred.compress.map.output is deprecated. Instead, 
> use mapreduce.map.output.compress
> 2016-02-16 12:05:19,480 INFO [IPC Server handler 0 on 13239] 
> app.DAGAppMaster: Generating DAG graphviz file, 
> dagId=dag_1455209830330_35852_1, 
> filePath=/u01/hadoop/yarn/log/application_1455209830330_35852/container_1455209830330_35852_01_000001/dag_1455209830330_35852_1.dot
> 2016-02-16 12:05:19,482 INFO [IPC Server handler 0 on 13239] 
> common.TezUtilsInternal: Redirecting log file based on addend: 
> dag_1455209830330_35852_1
>
> —————————————
>
> DAG-log:
> --------------------------
>
> 2016-02-16 12:05:19,482 INFO [IPC Server handler 0 on 13239] 
> app.DAGAppMaster: Running DAG: Export job (349): 
> ClaimLineExportToDB2#export(Disconnected record stream)#export(Disconnected 
> record (d4e9fda0-0f6c-4e63-8d5b-e846334affdf)
> 2016-02-16 12:05:19,966 INFO [HistoryEventHandlingThread] 
> impl.SimpleHistoryLoggingService: Writing event DAG_SUBMITTED to history file
>
> … user specific code
>
> 2016-02-16 12:05:21,437 INFO [InputInitializer [MapChainVertex:Export job 
> (349): ClaimLineExportToDB2#export(Disconnected record stream) 
> (2b86fca4-215f-47ba-9303-cbd2ad9d676e)] #0] util.LoggingUtil: Setting root 
> logger level for hadoop task to WARN:
> 2016-02-16 12:05:21,722 WARN [AsyncDispatcher event handler] 
> hdfs.ClientContext: Existing client context 'default' does not match 
> requested configuration.  Existing: shortCircuitStreamsCacheSize = 256, 
> shortCircuitStreamsCacheExpiryMs = 300000, shortCircuitMmapCacheSize = 256, 
> shortCircuitMmapCacheExpiryMs = 3600000, shortCircuitMmapCacheRetryTimeout = 
> 300000, shortCircuitCacheStaleThresholdMs = 1800000, socketCacheCapacity = 
> 16, socketCacheExpiry = 3000, shortCircuitLocalReads = false, 
> useLegacyBlockReaderLocal = false, domainSocketDataTraffic = false, 
> shortCircuitSharedMemoryWatcherInterruptCheckMs = 60000, Requested: 
> shortCircuitStreamsCacheSize = 4096, shortCircuitStreamsCacheExpiryMs = 
> 300000, shortCircuitMmapCacheSize = 256, shortCircuitMmapCacheExpiryMs = 
> 3600000, shortCircuitMmapCacheRetryTimeout = 300000, 
> shortCircuitCacheStaleThresholdMs = 1800000, socketCacheCapacity = 16, 
> socketCacheExpiry = 3000, shortCircuitLocalReads = true, 
> useLegacyBlockReaderLocal = false, domainSocketDataTraffic = false, 
> shortCircuitSharedMemoryWatcherInterruptCheckMs = 60000
> 2016-02-16 12:05:21,732 WARN [AsyncDispatcher event handler] ipc.Client: 
> Exception encountered while connecting to the server : 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException):
>  Operation category READ is not supported in state standby
> 2016-02-16 12:08:12,554 WARN [AsyncDispatcher event handler] ipc.Client: 
> Exception encountered while connecting to the server : 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException):
>  Operation category READ is not supported in state standby
>
> --------------------------
>
>
> Latest line of the (single) task:
> --------------------------
>
> ...
>
> 2016-02-16 12:07:00,166 INFO [Initializer 1] 
> runtime.LogicalIOProcessorRuntimeTask: Initializing Output using OutputSpec: 
> { destinationVertexName=output, physicalEdgeCount=0, 
> outputClassName=datameer.plugin.tez.output.DoNothingOutput }
> 2016-02-16 12:07:00,168 INFO [Initializer 0] 
> runtime.LogicalIOProcessorRuntimeTask: Creating Input
> 2016-02-16 12:07:00,168 INFO [Initializer 1] 
> runtime.LogicalIOProcessorRuntimeTask: Creating Output
> 2016-02-16 12:07:00,175 INFO [Initializer 1] 
> runtime.LogicalIOProcessorRuntimeTask: Initializing Output with dest edge: 
> output
> 2016-02-16 12:07:00,176 INFO [Initializer 0] 
> runtime.LogicalIOProcessorRuntimeTask: Initializing Input with src edge: 
> Input:Export job (349): ClaimLineExportToDB2#export(Disconnected record 
> stream)
> 2016-02-16 12:07:00,176 INFO [Initializer 0] 
> resources.MemoryDistributor$RequestorInfo: Received request: 0, type: INPUT, 
> componentVertexName: Input:Export job (349): 
> ClaimLineExportToDB2#export(Disconnected record stream)
> 2016-02-16 12:07:00,176 INFO [Initializer 1] 
> resources.MemoryDistributor$RequestorInfo: Received request: 0, type: OUTPUT, 
> componentVertexName: output
> 2016-02-16 12:07:00,177 INFO [Initializer 1] 
> runtime.LogicalIOProcessorRuntimeTask: Initialized Output with dest edge: 
> output
>
>
> … user specific code
>
> 2016-02-16 12:07:01,871 INFO [TezChild] util.LoggingUtil: Setting root
> logger level for hadoop task to WARN:
>
> 2016-02-16 12:07:03,990 WARN [TezChild] ipc.Client: Exception encountered 
> while connecting to the server : 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException):
>  Operation category READ is not supported in state standby
>
> —————————————
>
> The client log contains a shutdown exception:
> —————————————
>
> INFO [2016-02-16 12:13:30.412] [MrPlanRunnerV2] (TezClient.java:470) - 
> Shutting down Tez Session, sessionName=Export job (349): 
> ClaimLineExportToDB2#export(Disconnected record stream)#export(Disconnected 
> record (d4e9fda0-0f6c-4e63-8d5b-e846334affdf), 
> applicationId=application_1455209830330_35852
>  INFO [2016-02-16 12:13:30.581] [MrPlanRunnerV2] (TezClientUtils.java:838) - 
> Application not running, applicationId=application_1455209830330_35852, 
> yarnApplicationState=FINISHED, finalApplicationStatus=SUCCEEDED, 
> trackingUrl=http://xxx:8088/proxy/application_1455209830330_35852/, 
> diagnostics=Session stats:submittedDAGs=1, successfulDAGs=1, failedDAGs=0, 
> killedDAGs=0
>
>  INFO [2016-02-16 12:13:30.581] [MrPlanRunnerV2] (TezClient.java:484) - 
> Failed to shutdown Tez Session via proxy
> org.apache.tez.dag.api.SessionNotRunning: Application not running, 
> applicationId=application_1455209830330_35852, yarnApplicationState=FINISHED, 
> finalApplicationStatus=SUCCEEDED, 
> trackingUrl=http://xxx:8088/proxy/application_1455209830330_35852/, 
> diagnostics=Session stats:submittedDAGs=1, successfulDAGs=1, failedDAGs=0, 
> killedDAGs=0
>
>       at 
> org.apache.tez.client.TezClientUtils.getSessionAMProxy(TezClientUtils.java:839)
>       at org.apache.tez.client.TezClient.getSessionAMProxy(TezClient.java:669)
>       at org.apache.tez.client.TezClient.stop(TezClient.java:476)
>       at 
> datameer.plugin.tez.session.TezSessionImpl.close(TezSessionImpl.java:69)
>       at 
> datameer.plugin.tez.session.PoolingTezSessionFactory.shutdownSession(PoolingTezSessionFactory.java:173)
>       at 
> datameer.plugin.tez.session.PoolingTezSessionFactory.shutdownSessions(PoolingTezSessionFactory.java:167)
>       at 
> datameer.plugin.tez.session.PoolingTezSessionFactory.close(PoolingTezSessionFactory.java:162)
>       at datameer.com.google.common.io.Closer.close(Closer.java:214)
>       at 
> datameer.plugin.tez.smart.SmartClusterJobFlowResolver.close(SmartClusterJobFlowResolver.java:147)
>       at datameer.com.google.common.io.Closer.close(Closer.java:214)
>       at datameer.dap.sdk.util.IoUtil.close(IoUtil.java:330)
>       at 
> datameer.dap.common.graphv2.mixedframework.MixedClusterSession.commitJob(MixedClusterSession.java:65)
>       at 
> datameer.dap.common.graphv2.ClusterSession.commitJob(ClusterSession.java:81)
>       at 
> datameer.dap.common.graphv2.MrPlanRunnerV2.run(MrPlanRunnerV2.java:97)
>       at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>       at 
> datameer.dap.common.security.DatameerSecurityService.runAsUser(DatameerSecurityService.java:96)
>       at 
> datameer.dap.common.security.DatameerSecurityService.runAsUser(DatameerSecurityService.java:173)
>       at datameer.dap.common.security.RunAsThread.run(RunAsThread.java:22)
>  INFO [2016-02-16 12:13:30.582] [MrPlanRunnerV2] (TezClient.java:489) - Could 
> not connect to AM, killing session via YARN, sessionName=Export job (349): 
> ClaimLineExportToDB2#export(Disconnected record stream)#export(Disconnected 
> record (d4e9fda0-0f6c-4e63-8d5b-e846334affdf), 
> applicationId=application_1455209830330_35852
>
>
> —————————————
>
>
> The counter and the event shows that:
> - the correct amount of input records has been ingested
> - the expected amount of output records is missing (we have zero records
> produced which is reflected in both, the counters and the output itself)
> - for Tez the task and the DAG are both handled as successful
>
> The customers says that the symptoms of the job are always the same:
> - there is that standby warning
> - there is the problem with the shutdown
> - (and the data is missing)
>
> To me its still unclear if there are strange things going on with Tez or
> with our application logic.
> The same job is successful when executed with pure MapReduce.
> Is the standby thing something to worry about ?
>
> best
> Johannes
>



-- 
Best Regards

Jeff Zhang

Reply via email to