java.lang.Exception: Container 
[pid=17248,containerID=container_1597847003686_12235_01_001336] is running 
beyond physical memory limits. Current usage: 5.0 GB of 5 GB physical memory 
used; 7.0 GB of 25 GB virtual memory used. Killing container.
Dump of the process-tree for container_1597847003686_12235_01_001336 :
    |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) 
SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
    |- 17283 17248 17248 17248 (java) 1025867 190314 7372083200 1311496 
/usr/local/jdk1.8/bin/java -Xmx2147483611 -Xms2147483611 
-XX:MaxDirectMemorySize=590558009 -XX:MaxMetaspaceSize=268435456 -server 
-XX:+UseConcMarkSweepGC -XX:+UseCMSInitiatingOccupancyOnly 
-XX:CMSInitiatingOccupancyFraction=75 -XX:ParallelGCThreads=4 
-XX:+AlwaysPreTouch -XX:NewRatio=1 -DjobName=fastmidu-deeplink-tuid-20200203 
-Dlog.file=/data1/yarn/containers/application_1597847003686_12235/container_1597847003686_12235_01_001336/taskmanager.log
 -Dlog4j.configuration=file:./log4j.properties 
org.apache.flink.yarn.YarnTaskExecutorRunner -D 
taskmanager.memory.framework.off-heap.size=134217728b -D 
taskmanager.memory.network.max=456340281b -D 
taskmanager.memory.network.min=456340281b -D 
taskmanager.memory.framework.heap.size=134217728b -D 
taskmanager.memory.managed.size=1825361124b -D taskmanager.cpu.cores=5.0 -D 
taskmanager.memory.task.heap.size=2013265883b -D 
taskmanager.memory.task.off-heap.size=0b --configDir . 
-Djobmanager.rpc.address=di-h4-dn-134.h.ab1.qttsite.net -Dweb.port=0 
-Dweb.tmpdir=/tmp/flink-web-f63d543b-a75a-4dc4-be93-979eebd8062d 
-Djobmanager.rpc.port=43423 -Drest.address=di-h4-dn-134.h.ab1.qttsite.net 
    |- 17248 17246 17248 17248 (bash) 0 0 116015104 353 /bin/bash -c 
/usr/local/jdk1.8/bin/java -Xmx2147483611 -Xms2147483611 
-XX:MaxDirectMemorySize=590558009 -XX:MaxMetaspaceSize=268435456 -server 
-XX:+UseConcMarkSweepGC -XX:+UseCMSInitiatingOccupancyOnly 
-XX:CMSInitiatingOccupancyFraction=75 -XX:ParallelGCThreads=4 
-XX:+AlwaysPreTouch -XX:NewRatio=1 -DjobName=fastmidu-deeplink-tuid-20200203 
-Dlog.file=/data1/yarn/containers/application_1597847003686_12235/container_1597847003686_12235_01_001336/taskmanager.log
 -Dlog4j.configuration=file:./log4j.properties 
org.apache.flink.yarn.YarnTaskExecutorRunner -D 
taskmanager.memory.framework.off-heap.size=134217728b -D 
taskmanager.memory.network.max=456340281b -D 
taskmanager.memory.network.min=456340281b -D 
taskmanager.memory.framework.heap.size=134217728b -D 
taskmanager.memory.managed.size=1825361124b -D taskmanager.cpu.cores=5.0 -D 
taskmanager.memory.task.heap.size=2013265883b -D 
taskmanager.memory.task.off-heap.size=0b --configDir . 
-Djobmanager.rpc.address='di-h4-dn-134.h.ab1.qttsite.net' -Dweb.port='0' 
-Dweb.tmpdir='/tmp/flink-web-f63d543b-a75a-4dc4-be93-979eebd8062d' 
-Djobmanager.rpc.port='43423' -Drest.address='di-h4-dn-134.h.ab1.qttsite.net' 
1> 
/data1/yarn/containers/application_1597847003686_12235/container_1597847003686_12235_01_001336/taskmanager.out
 2> 
/data1/yarn/containers/application_1597847003686_12235/container_1597847003686_12235_01_001336/taskmanager.err
 

Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143

    at 
org.apache.flink.yarn.YarnResourceManager.lambda$onContainersCompleted$0(YarnResourceManager.java:343)
    at 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:402)
    at 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:195)
    at 
org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:74)
    at 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:152)
    at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26)
    at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21)
    at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
    at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21)
    at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170)
    at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
    at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
    at akka.actor.Actor$class.aroundReceive(Actor.scala:517)
    at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225)
    at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592)
    at akka.actor.ActorCell.invoke(ActorCell.scala:561)
    at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258)
    at akka.dispatch.Mailbox.run(Mailbox.scala:225)
    at akka.dispatch.Mailbox.exec(Mailbox.scala:235)
    at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
    at 
akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
    at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
    at 
akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

Hi :
    生产上用的flink 1.10.1 版本的flink,经常有任务重启,然后在UI里面报错如上面的信息。

这种情况目前的处理方式是调大每个TaskManager的内存大小,除了这种方式,还有没有其他方式,有没有什么具体实用的排查方式,具体的原因是什么呢???


回复