Re: Re:flink s3 checkpoint 一直IN_PROGRESS(100%)直到失败

2022-03-08 文章 Yun Tang
Hi

一般是卡在最后一步从JM写checkpoint meta上面了,建议使用jstack等工具检查一下JM的cpu栈,看问题出在哪里。


祝好
唐云

From: Sun.Zhu <17626017...@163.com>
Sent: Tuesday, March 8, 2022 14:12
To: user-zh@flink.apache.org 
Subject: Re:flink s3 checkpoint 一直IN_PROGRESS(100%)直到失败

图挂了

https://postimg.cc/Z9XdxwSk













在 2022-03-08 14:05:39,"Sun.Zhu" <17626017...@163.com> 写道:

hi all,
flink 1.13.2,将checkpoint 写到S3但是一直成功不了,一直显示IN_PROGRESS,直到超时失败,有大佬遇到过吗?








?????? Flink????????????

2022-03-08 文章 hjw
streaming api ??sql api 
streaming api




----
??: 
   "user-zh"

https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/fault-tolerance/state/

hjw <1010445...@qq.com.invalid ??2022??3??9?? 01:32??

 sql??SELECT color, sum(id) FROM T GROUP BY
 
colorFlinkTgroup
 by 
key??color)??Flink???

Re: flink on yarn任务停止发生异常

2022-03-08 文章 Jiangang Liu
异常提示的很明确了,做savepoint的过程中有的task不在running状态,你可以看下你的作业是否发生了failover。

QiZhu Chan  于2022年3月8日周二 17:37写道:

> Hi,
>
> 各位社区大佬们,帮忙看一下如下报错是什么原因造成的?正常情况下客户端日志应该返回一个savepoint路径,但却出现如下异常日志,同时作业已被停止并且查看hdfs有发现当前job产生的savepoint文件
>
>
>
>
>


Re: Flink计算机制疑问

2022-03-08 文章 Jiangang Liu
只存计算结果,来一条数据更新一次状态并且下发出去。具体可以参考下state:
https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/fault-tolerance/state/

hjw <1010445...@qq.com.invalid> 于2022年3月9日周三 01:32写道:

> 如下一段sql:SELECT color, sum(id) FROM T GROUP BY
> colorFlink在实际计算中会将T流整个存入状态里,流中来一条数据触发一次全流计算。亦或是状态只存计算结果,来了新的一条数据,在原来同group
> by key(color)结果进行加减即可。这种具体Flink的运行机制请问有文档翻阅或者有规律进行总结吗?谢谢。


Re: Re: k8s native session 问题咨询

2022-03-08 文章 Yang Wang
你用新版本试一下,看着是已经修复了

https://issues.apache.org/jira/browse/FLINK-19212

Best,
Yang

崔深圳  于2022年3月9日周三 10:31写道:

>
>
>
> web ui报错:请求这个接口: /jobs/overview,时而报错, Exception on server
> side:\norg.apache.flink.runtime.rpc.akka.exceptions.AkkaRpcException:
> Failed to serialize the result for RPC call :
> requestMultipleJobDetails.\n\tat
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.serializeRemoteResultAndVerifySize(AkkaRpcActor.java:417)\n\tat
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.lambda$sendAsyncResponse$2(AkkaRpcActor.java:373)\n\tat
> java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:836)\n\tat
> java.util.concurrent.CompletableFuture.uniHandleStage(CompletableFuture.java:848)\n\tat
> java.util.concurrent.CompletableFuture.handle(CompletableFuture.java:2168)\n\tat
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.sendAsyncResponse(AkkaRpcActor.java:365)\n\tat
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcInvocation(AkkaRpcActor.java:332)\n\tat
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:217)\n\tat
> org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:78)\n\tat
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:163)\n\tat
> akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:24)\n\tat
> akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:20)\n\tat
> scala.PartialFunction.applyOrElse(PartialFunction.scala:123)\n\tat
> scala.PartialFunction.applyOrElse$(PartialFunction.scala:122)\n\tat
> akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:20)\n\tat
> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)\n\tat
> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172)\n\tat
> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172)\n\tat
> akka.actor.Actor.aroundReceive(Actor.scala:537)\n\tat
> akka.actor.Actor.aroundReceive$(Actor.scala:535)\n\tat
> akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:220)\n\tat
> akka.actor.ActorCell.receiveMessage(ActorCell.scala:580)\n\tat
> akka.actor.ActorCell.invoke(ActorCell.scala:548)\n\tat
> akka.dispatch.Mailbox.processMailbox(Mailbox.scala:270)\n\tat
> akka.dispatch.Mailbox.run(Mailbox.scala:231)\n\tat
> akka.dispatch.Mailbox.exec(Mailbox.scala:243)\n\tat
> java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)\n\tat
> java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)\n\tat
> java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)\n\tat
> java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175)\nCaused
> by: java.io.NotSerializableException: java.util.HashMap$Values\n\tat
> java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1184)\n\tat
> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)\n\tat
> java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)\n\tat
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)\n\tat
> java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)\n\tat
> java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)\n\tat
> org.apache.flink.util.InstantiationUtil.serializeObject(InstantiationUtil.java:632)\n\tat
> org.apache.flink.runtime.rpc.akka.AkkaRpcSerializedValue.valueOf(AkkaRpcSerializedValue.java:66)\n\tat
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.serializeRemoteResultAndVerifySize(AkkaRpcActor.java:400)\n\t...
> 29 more\n\nEnd of exception on server side"
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> 在 2022-03-09 09:56:21,"yu'an huang"  写道:
> >你好,路由到非master节点会有什么问题吗,非master节点在处理请求时应该会通过HA服务找到Active的Job manager,
> 然后向Active Job manager拿到结果再返回给client.
> >
> >> On 7 Mar 2022, at 7:46 PM, 崔深圳  wrote:
> >>
> >> k8s native session 模式下,配置了ha,job_manager 的数量为3,然后web ui,通过rest
> service访问,总是路由到非master节点,有什么办法使其稳定吗?
> >
>


Re:Re: k8s native session 问题咨询

2022-03-08 文章 崔深圳



web ui报错:请求这个接口: /jobs/overview,时而报错, Exception on server 
side:\norg.apache.flink.runtime.rpc.akka.exceptions.AkkaRpcException: Failed to 
serialize the result for RPC call : requestMultipleJobDetails.\n\tat 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.serializeRemoteResultAndVerifySize(AkkaRpcActor.java:417)\n\tat
 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.lambda$sendAsyncResponse$2(AkkaRpcActor.java:373)\n\tat
 
java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:836)\n\tat
 
java.util.concurrent.CompletableFuture.uniHandleStage(CompletableFuture.java:848)\n\tat
 
java.util.concurrent.CompletableFuture.handle(CompletableFuture.java:2168)\n\tat
 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.sendAsyncResponse(AkkaRpcActor.java:365)\n\tat
 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcInvocation(AkkaRpcActor.java:332)\n\tat
 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:217)\n\tat
 
org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:78)\n\tat
 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:163)\n\tat
 akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:24)\n\tat 
akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:20)\n\tat 
scala.PartialFunction.applyOrElse(PartialFunction.scala:123)\n\tat 
scala.PartialFunction.applyOrElse$(PartialFunction.scala:122)\n\tat 
akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:20)\n\tat 
scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)\n\tat 
scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172)\n\tat 
scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172)\n\tat 
akka.actor.Actor.aroundReceive(Actor.scala:537)\n\tat 
akka.actor.Actor.aroundReceive$(Actor.scala:535)\n\tat 
akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:220)\n\tat 
akka.actor.ActorCell.receiveMessage(ActorCell.scala:580)\n\tat 
akka.actor.ActorCell.invoke(ActorCell.scala:548)\n\tat 
akka.dispatch.Mailbox.processMailbox(Mailbox.scala:270)\n\tat 
akka.dispatch.Mailbox.run(Mailbox.scala:231)\n\tat 
akka.dispatch.Mailbox.exec(Mailbox.scala:243)\n\tat 
java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)\n\tat 
java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)\n\tat
 java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)\n\tat 
java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175)\nCaused
 by: java.io.NotSerializableException: java.util.HashMap$Values\n\tat 
java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1184)\n\tat 
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)\n\tat
 java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)\n\tat 
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)\n\tat
 java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)\n\tat 
java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)\n\tat 
org.apache.flink.util.InstantiationUtil.serializeObject(InstantiationUtil.java:632)\n\tat
 
org.apache.flink.runtime.rpc.akka.AkkaRpcSerializedValue.valueOf(AkkaRpcSerializedValue.java:66)\n\tat
 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.serializeRemoteResultAndVerifySize(AkkaRpcActor.java:400)\n\t...
 29 more\n\nEnd of exception on server side"














在 2022-03-09 09:56:21,"yu'an huang"  写道:
>你好,路由到非master节点会有什么问题吗,非master节点在处理请求时应该会通过HA服务找到Active的Job manager, 然后向Active 
>Job manager拿到结果再返回给client.
>
>> On 7 Mar 2022, at 7:46 PM, 崔深圳  wrote:
>> 
>> k8s native session 模式下,配置了ha,job_manager 的数量为3,然后web ui,通过rest 
>> service访问,总是路由到非master节点,有什么办法使其稳定吗?
>


Re:Re: k8s native session 问题咨询

2022-03-08 文章 崔深圳
web ui报错:请求这个接口: /jobs/overview,时而报错, Exception on server 
side:\norg.apache.flink.runtime.rpc.akka.exceptions.AkkaRpcException: Failed to 
serialize the result for RPC call : requestMultipleJobDetails.\n\tat 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.serializeRemoteResultAndVerifySize(AkkaRpcActor.java:417)\n\tat
 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.lambda$sendAsyncResponse$2(AkkaRpcActor.java:373)\n\tat
 
java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:836)\n\tat
 
java.util.concurrent.CompletableFuture.uniHandleStage(CompletableFuture.java:848)\n\tat
 
java.util.concurrent.CompletableFuture.handle(CompletableFuture.java:2168)\n\tat
 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.sendAsyncResponse(AkkaRpcActor.java:365)\n\tat
 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcInvocation(AkkaRpcActor.java:332)\n\tat
 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:217)\n\tat
 
org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:78)\n\tat
 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:163)\n\tat
 akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:24)\n\tat 
akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:20)\n\tat 
scala.PartialFunction.applyOrElse(PartialFunction.scala:123)\n\tat 
scala.PartialFunction.applyOrElse$(PartialFunction.scala:122)\n\tat 
akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:20)\n\tat 
scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)\n\tat 
scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172)\n\tat 
scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172)\n\tat 
akka.actor.Actor.aroundReceive(Actor.scala:537)\n\tat 
akka.actor.Actor.aroundReceive$(Actor.scala:535)\n\tat 
akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:220)\n\tat 
akka.actor.ActorCell.receiveMessage(ActorCell.scala:580)\n\tat 
akka.actor.ActorCell.invoke(ActorCell.scala:548)\n\tat 
akka.dispatch.Mailbox.processMailbox(Mailbox.scala:270)\n\tat 
akka.dispatch.Mailbox.run(Mailbox.scala:231)\n\tat 
akka.dispatch.Mailbox.exec(Mailbox.scala:243)\n\tat 
java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)\n\tat 
java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)\n\tat
 java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)\n\tat 
java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175)\nCaused
 by: java.io.NotSerializableException: java.util.HashMap$Values\n\tat 
java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1184)\n\tat 
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)\n\tat
 java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)\n\tat 
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)\n\tat
 java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)\n\tat 
java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)\n\tat 
org.apache.flink.util.InstantiationUtil.serializeObject(InstantiationUtil.java:632)\n\tat
 
org.apache.flink.runtime.rpc.akka.AkkaRpcSerializedValue.valueOf(AkkaRpcSerializedValue.java:66)\n\tat
 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.serializeRemoteResultAndVerifySize(AkkaRpcActor.java:400)\n\t...
 29 more\n\nEnd of exception on server side"
在 2022-03-09 09:56:21,"yu'an huang"  写道:
>你好,路由到非master节点会有什么问题吗,非master节点在处理请求时应该会通过HA服务找到Active的Job manager, 然后向Active 
>Job manager拿到结果再返回给client.
>
>> On 7 Mar 2022, at 7:46 PM, 崔深圳  wrote:
>> 
>> k8s native session 模式下,配置了ha,job_manager 的数量为3,然后web ui,通过rest 
>> service访问,总是路由到非master节点,有什么办法使其稳定吗?
>


Re: k8s native session 问题咨询

2022-03-08 文章 yu'an huang
你好,路由到非master节点会有什么问题吗,非master节点在处理请求时应该会通过HA服务找到Active的Job manager, 然后向Active 
Job manager拿到结果再返回给client.

> On 7 Mar 2022, at 7:46 PM, 崔深圳  wrote:
> 
> k8s native session 模式下,配置了ha,job_manager 的数量为3,然后web ui,通过rest 
> service访问,总是路由到非master节点,有什么办法使其稳定吗?



Flink????????????

2022-03-08 文章 hjw
sql??SELECT color, sum(id) FROM T GROUP BY 
colorFlinkTgroup
 by 
key??color)??Flink???